Chris Anderson, editor-in-chief of Wired Magazine, has a very interesting article in a recent issue of Wired where he talks about a paradigm shift in research. Abundant data, he postulates, is changing the way that research progresses. Instead of needing theories to understand data, with almost infinite data, theories may no longer be needed as the answers are the data itself. As an example, whereas in current scientific thinking correlation is not sufficient cause for understanding (correlation may simply be due to error or coincidence), given enough data, correlation IS relevant because there is enough data to confirm or deny its statistical meaning. In reading his article I was reminded of the symposium at AILA last week organised by Nick Ellis on usage-based language acquisition. His, and his co-presenters’ work, shows (among other frequency-related aspects of language) the importance of lexis for grammar (and questions whether the two can be separated). Lexis is, of course, a form of data (as opposed to rules) and what Ellis’ research has shown is that SLA relies on massive exposure to this data. For language learning research, corpus data is not just offering evidence to support existing theories, but is offering new theories in and of itself. By collecting and analysing enough data, we can see patterns that answer some of our existing questions, and perhaps even some questions we did not even know we had. In other words: data = knowledge. (if you have doubts, look up research done at Large Hadron Collider. Staggering stuff).