Importance of Context in Analytics

Context: England crashed out of the Cricket World Cup by losing to (no disrespect intended) a smaller team (Bangladesh) and their coach started talking about looking at data which, as expected, hasn't gone down well.

I found a few key quotes in this article that illustrate the need to balance measurements with sensibility:

  1. "England have, under Moores, known the price of everything but the value of nothing."
  2. "But the data helped imprison them. ...The data, doubtless, meant the batting order could never be adjusted to take advantage of or respond to a changed set of circumstances. The data said this and that and turned intelligent cricketers - for such we assume they must be – into morons."
  3. "...the data doesn’t mean nearly as much as you think it does there comes a point when your theory has to be reconsidered."

    and finally

  4. "It’s tough to measure this sort of thing, however, so instead we assume that what’s important is what can be measured. That isn’t the case. It leads to a bad place."


These quotes succintly contextualize several facets of data analysis:

  • the push-pull relationship between analysis and action
  • asking the right questions
  • measuring the right things
  • utilizing domain expertise
  • understanding stakeholders and their latent needs
  • communicating and acting on insights
  • knowing when, where and how to rely on data

Amidst the cacophony of complex statistics, models and a variety of tools, it is important to remember that ALL organizations are results-driven.

It is already hard enough to translate complex technical insights to a broader audience that is juggling several priorities. Despite this, a data analyst is responsible for utilizing the available technical resources and organizational strategies in delivering the results that are demanded by businesses/organizations.

To that end, the context of analytics usage is paramount.

A Philosophical Study of Random Walks

I've been taking a class in Statistical Analysis with Prof. Yuri Balasanov and we recently covered the concept of a Random Walk.

Random Walks are interesting to study because they expose the frailty of our grasp over probabilistic thinking. They are also interesting from a philosophical point of view. A minor class discussion point moved me immensely and inspired me to write this post to illustrate the power of choices.


What is a Random Walk?

In very simple layman's terms, a random walk is a succession of random steps with a probability associated with each outcome. This is typically illustrated using a (fair) coin toss.

Let's illustrate these points and test our intuition using using a fair coin i.e a coin whose probability of Heads = probability of Tails = 0.5.


Expt 1: If I flipped the coin 100,000 times, what probability will the tails converge at?

If you guessed 0.5, you're right! As the graph below shows, over a large enough sample, the convergence is at 0.5.

Graph 1: Convergence of tails

So far, so good.


Expt 2: Let's play a game. I give you a dollar each time the coin lands as a tail and take away a dollar each time the coin lands as heads. If we flipped the coin a million times, how much money do you think you will make?

Given the results of the first experiment, most people would guess that the profits and losses would net themselves out and we would end up with a zero profit.

Graph 2: Random Walks

But in reality, there are several possibilities as illustrated by the graph above. These graphs illustrate that there are several possible cumulative outcomes each independent of the past. This is the core illustration of a Random Walk, and this is the point at which I had a mini "Aha!" moment. This is also what Nassim Taleb refers to as the concept of Alternate Histories in his book Fooled by Randomness.

These basic set of graphs illustrate two fundamentally simple, yet profound points:

  1. A random walk is not mean converging. In simple English: While the probability of an individual event may be 50-50, a cumulative outcome will vary significantly from the expected average outcome. This has significant implications in financial markets, but also in day to day activities such as individual performances, individual choices etc.

  2. The graphs demonstrate possible outcomes from time t = 0 to t = INF. Some graphs consistently go up, others go down while yet others oscillate. Any time an individual plays this game, s/he is already pinned to a single path of outcomes. However, and this was my fundamental insight, at any time t = N, an individual ALWAYS HAS A CHOICE to play or not to play.

Graph 3: Illustration of Choice

This is because at any time t = N, though there are several paths and whilst an outcome cannot deviate from an individual path, the player always has a choice to play for the next outcome or to not play. This has the impact of changing the player's path from the one he is on to one he invents anew through choice and an equiprobable outcome. In effect, a player cannot switch paths, but can invent a new path for himself through his choices.

From a philosophical standpoint, I wonder if this is what we call destiny.

R Basics Workshop (at UChicago)

I conducted an R Workshop at the University of Chicago. This post contains the materials and code adapted from a Slidify presentation. The full presentation and code can be found on this repo.

Agenda

  1. Installing and Loading Packages
  2. Reading Data into R - using jsonlite package
  3. Reshaping Data (Wide to Long format) - using reshape2 package
  4. Data Manipulation - using dplyr package
  5. Data Visualization - using ggplot2 package

Installing Packages

  • install.packages("packagename")
  • library(packagename)

Exercise 1: Install and load these packages into the R environment

  • httr
  • jsonlite
  • dplyr
  • ggplot2

Reading Data into R

Exercise 2: Fetch your IP address


Data Manipulation

  • dplyr
  • Key features
    • tbl_df()
    • %>% chaining operator
    • Verbs
      • select
      • filter
      • arrange
      • mutate
      • summarise

Exercise 3a: Come up with a metric for the sports dataset and use dplyr to generate it

Exercise 3b: Explain why the metric is relevant


Data Visualization

  • ggplot2: Grammar of Graphics
    • Plots as objects
    • Layering aesthetics and options

Exercise 4: Visualize the sports dataset metric


Summary

  • Installing and Loading packages in R
  • Reading data
    • read.csv
    • jsonlite package
  • Data munging using dplyr
  • Visualization using ggplot2
  • Metrics
    • Context
    • Relevance
    • Communication

Resources

Hello World

Hey there! My name is Narayanan "Naveen" Venkataraman. I am a student at The University of Chicago studying Analytics. My github contains code for the projects I'm currently working on including: classroom work, writing portfolio and open source contributions.

R projects
  1. Currency Arbitrage
  2. R For Excel Users - In partnership with John Taveras