Direct Data

By Jessy

It is the end of science. Science as hypothesis is giving way to science as statistically derived by large data sets. We’re finding in many cases that data mining algorithms applied to loosely specified models leads to better results than finely tuned models derived from theoretical equations [see the unreasonable effectiveness of data (pdf)].

It’s not just that there’s more data (though that is certainly also true). There’s also entirely new categories of data. One of these categories is something I call ‘Direct Data’. Direct data is already enabling an entirely new approach to the quantitative study of patterns in human behaviour.

So-called direct data is data generated as a product of human interactions with, or usage of, technology. It would not be generated without those interactions, but nor is it created as an add-on or afterthought. It is a by-product of our use of digital media. Think about packet streams from browsing activities, positional data from mobile phones, or statistics of word usage from analyses of an author’s works.

As a result of this (indirect!) derivation process, the data is in many ways a more direct capture of human behaviours and intentions than data sets which seek to measure these things explicitly.

Why is this interesting? Because this data is produced as a by-product, it is notably benevolent. For example, information about my wake/sleep cycles derived from email timestamps or browsing history, has little likelihood (in aggregate) to be incorrect* or influenced by subjective opinions or conditions. Compare this to the traditional method of surveys asking people to assess their own sleep/wake cycles, and it’s clear that the potential for (more) objectively measuring our behaviour could lead to entirely new areas of discovery and research.

Ironically, understanding the potential of direct data also leads to a certain tension. Could we design direct data artifacts purposefully into systems? If we did that, would it still be the benevolent, objective data we seek? Could we formalize and identify types of direct data more or less appropriate for a given system?

All of these are exciting and fascinating questions to explore as we further formalize the quantitative study of human behaviours.

[* modulo correct system time, which we'll save for another discussion]

Tags: , , ,

One Comment

  1. vicki commented on November 18, 2009 | Permalink

    Interesting to read this as I index von Franz on the importance of single events vs. averages, large replicated sets of experiments, etc. I think you will enjoy the commentary on von Franz’s Number & Time when it is done.
    V.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

Archives

  • January 2010
  • November 2009
  • July 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • Site Feeds

    Posts
    Comments

    Marginal Structure Posts RSS feed

    Site Tags