Behind the Line: Analytics

The Controversy

People are often worried about other people violating their privacy. No one wants what is personal to them available for anyone, much less everyone to examine and scrutinize. So, the concept of a company monitoring them can be unnerving, frightening, or even threatening to some users, and has led to legislation enforcing limits on the practice of analytics.

(Please note that all numbers provided in examples are made up on the spot and are in no way derived on any actual usage statistics. Any similarity to any game statistics, live or dead, is entirely coincidental)

What does ‘Analytics’ mean?

Analytics systems are routines embedded in a game or a program that sends usage data back to the developer. At first that might sound unnecessary, at least for an online game, since the server would receive everything that happens in the game it already knows everything there is to know, right?

…well, not exactly.

You see, the game server is already pretty busy running the game. It has to make sure new players can create an account, existing players can log in, players that are playing are following the rules, and that everything is moving along smoothly.

Since the game server is plenty busy enough, you need a separate system to actually organize and save the data that you want to study. This is where the analytics system comes in. It has its own database that just stores this information, and the analytics system talks to its database differently than the game talks to the game server. It has to send selected data that can be arranged in such a way as to let the developer know specific things.

A quick example is a tutorial funnel. I have a game with a tutorial, and I want to know how good it is at introducing people into the game, so in the system I create a tutorial funnel. Every step in the tutorial sends an event to the server, so I can see how long someone is taking on each step, and if the events stop, when they fall off. If I have 10 steps in the tutorial, and each step loses 5% of players, then 50% finish the tutorial, and no one of the steps is a particular problem.

However, if I have 10 steps, each step loses 2% of the players, but step 6 loses 32%, then I still have 50% of the players at the end, but step 6 has a really big problem, and if I can identify the problem and fix it, I might be able to keep 80% of players through the tutorial!

There are a lot of other things analytics can do too. Have you ever seen a heat map of a Counter Strike map?

heat610

Well, you have now!

The “heat” in this map is where players fire their weapons from, as well as percentages for team choice and weapon type. This can give insight into what makes a good map.

  • Add to that a tally of how often each map is used
  • Add to that a counter to show how experienced each player is in these maps
  • Add to that a representation of how successful (kill/death ratio) each player is in these maps
  • Add to that a count for game settings for each map
  • Add to that a count for what weapons are being fired
  • Add to that a heat map showing where people are when they get killed

For the most part, these data points aren’t difficult to implement. They all have clear events associated with them (a match starts, weapon is fired, player dies, etc), and clear points of data to send with them (location coordinates, weapon ID, map ID, etc.) making those events pretty straight forward. Then an analyst can examine the data sent and leveraged it to give a very intricate picture of what makes one map more popular than another.

What is scary?

Now, it’s possible that you’ve already noticed a logical problem with how I described the functionality so far. In the tutorial funnel example, how do you know when a player has dropped out of the funnel? They could come back later and continue. This is done by the analytics system giving you an identifier. This is how it can track everything you do, and associate it with you. So, going back to the tutorial example, I could be able to determine that if a user completes the tutorial in under 5 minutes, they’re 75% to play the game for at least a month, 50% likely spend money within 1 week, and 33% likely to spend at least $10.

This is a lot of information that can be gleaned from someone’s use of a system. There is more that can be done too, to try to understand a user. What device you are using can be recorded so the developer can understand it’s most important platforms. Even what ad you clicked on to get you to the game can be tracked so the developer can know what it’s best ads are, or what its best performing demographic is.

Everything about this adds up in such a way that some people feel that it is now encroaching on privacy. The analytics system can develop a profile for the player, potentially tracking everything they do. It feels a lot like you are being watched while you play. If you take that point of view, then it really isn’t a large leap to call the practice an invasion of privacy.

What is unethical?

When we start going past simply tracking usage information, and making robust profiles for users, then some unsettling things become possible. We can create profiles for types of users, but a system can also target an individual user.  Then, the biggest problem becomes a case of “Won’t someone please think of the children!” However, in this case, there is absolutely a cause to be worried.

Some unscrupulous devs will profile and target kids.  This is now explicitly illegal in some places.

Some particularly unscrupulous devs will profile and target kids. While this can be bad enough through game mechanics, it is now explicitly illegal to use analytics to profile and target kids in some places.

One of the biggest concerns in terms of exploitation is the idea that a developer will actively target children and manipulate or deveive them into making a real money purchase. Using systems to identify which users are children, and then learn what is most effective at getting them into the game, and to unknowingly make a purchase with their parent’s money can only be described as predatory.

What is illegal?

There have been several pieces of legislation that have passed in some locations that limit what is fair game to be tracked. Because most games target being released world wide, it’s best for a developer to skip anything that anyone objects to. This could include things that can be defined as Personally Identifiable Information (aka PII). This would include name, email address, IP address, Facebook or Twitter account, phone number, address, or anything else that can be directly tied to a user. IP address isn’t really a direct tie, but it is close enough to cause worry, and one of the ones that is explicitly illegal in some places.

Essentially, governments are aware of the potential of actual privacy infringement, as well as predatory behavior from unscrupulous developers. Plenty of devs can behave shady enough without access to this kind of information. On top of that, when you can tie user behavior back to an individual, then you really are just watching someone play. You have detailed records of who this player is, and what they do in the game. It’s that Personally Identifiable Information that is what really crosses the line, because without it you can only take the aggregate of the data to find trends.

What is good?

Plain and simple, there is no better way to understand what people do. When given a survey, people will respond with what they want to do, rather than what they actually would do. When interviewed, people are keenly aware that they are being observed. An analytics system will quietly track how people actually play the game, what they do like, what they don’t, the behavior of people who play the game longer, or pretty much anything that you can think of a way to phrase is a logical, data centric question.

This goes beyond the simple accusation of trying to exploit addictive behavior in a free to play game, or learning how to design better games, or games at all, really. Who knows more about the viewing habits of the public? Neilson, or Netflix? One is based on sampling the public, and the other observes data from EVERY USER EVERYWHERE EVER. Netflix can take this information to understand what people really do like to watch, and use that to direct what projects for new shows they finance. It’s not surprising, then, that they’ll go for acclaimed series like “Orange is the New Black”, or “House of Cards”.

There are more applications, as well. You may have seen those IBM commercials talking about analytics.

This is a pretty good quick explanation of the potential applications, and those applications will continue to grow in the future.  We will be able to construct more and more sophisticated models to understand whatever system we are studying.

What can go wrong?

Now, just because we know that there are lines that shouldn’t be crossed, steering clear of those won’t mean that nothing else could go wrong with an analytics system. Of course, the worst possibility would be if someone is collecting PII and they suffer a data breach. Then not only would the users have had their privacy violated, it could be vulnerable to identity theft from even more unscrupulous parties.

However, if someone is gathering data, and does the right thing in avoiding any PII, then they could still be prone to other errors. An analytics system will only record the data you tell it to. If you are not careful, then you may not collect enough data to draw any valuable conclusions from. Back to the tutorial funnel example, let’s say I mark 10 points in a tutorial that actually has 100 transitions or actions, and I expect to take a user up to 20 minutes to complete. If I notice a drop off, I can only narrow that down to something in between those points I set up. Instead, I should have an event for every player action, so I can find the specific action that causes players problems. This seems fairly self evident with this example, but there could be other aspects in the game that will require very careful thought as to how to set up their events to be able to glean useful information for them.

Even if you have good data, and a robust schema, it can still take care to avoid drawing an incorrect conclusion. Again, let’s look at the tutorial funnel example. Let’s say we do have the one step that drops over 30% of users. The most apparent possibility is that the users have a problem with that step, and don’t want to continue. However, what if all of those users run a particular graphics card, and that step in the tutorial crashes when run on that graphics card? Cross referencing the data to find solid trends can be important, and it’s not apparent just what is needed, so a lot of skill can be needed to do this well.

What do I think?

Personally, I’ve always thought that the backlash against analytics is a bit overblown. If I’m using a networked application, then I have no expectation of privacy with how I’m using it. It is inherent in the use of the application that I have to tell someone what I am doing for it to work. If the developer is recording that information then that’s simply them remembering what I have told them.

There are plenty of players who get irate at the concept that if something goes wrong an admin at customer support can’t roll back their game data to an earlier state. Never mind that doing this would require something entirely different, but there’s a tacit acknowledgement there that the game should be keeping detailed records of what the player is doing. I would hope that none of these players would ever complain about an analytics system tracking their behavior.

Perhaps the key is that as a society our understanding of privacy will have to change. Instinctively, many of these systems feel private because we may not be interacting with another person, or actively registering our names with what we’re doing. The fact of the matter, though, is that for any online interaction to work, you MUST transmit your IP address to the server. While that doesn’t tie online activity directly to a user, it can leave signs pointed at the user. So, barring a clever VPN mask or something, nothing online has ever been fully anonymous anyway. There’s also the social media angle, like in the IBM commercial above, where these aren’t private communications. If you’re sharing with the world that you got a new bike, you shouldn’t get upset that someone listened to you.

In the end, though, you gotta remember that these systems aren’t there to track YOU. Really, no one following this data cares about YOU as an individual. Any information on YOU is not valuable from a development standpoint. The value is when there’s data for hundreds, thousands, hundreds of thousands of users. The more points of data there are, then more accurate the aggregate is. The aggregate is the only thing that anyone really cares about.

For my own use, I know I gave them the data, and I think they can use it.  They just shouldn’t do something stupid, like illegally tie it directly to me then sell the data to someone else.

 

 


 

Kynetyk is a veteran of the games industry.  Behind the Line is written to help improve understanding of what goes on in the game development process and the business behind it.  From “What’s taking this games so long to release”, to “why are there bugs”, to “Why is this free to play” or anything else,  if there is a topic that you would like to see covered, please write in to kynetyk@enthusiacs.com

Leave a Reply

Your email address will not be published. Required fields are marked *