How we do it.
The UK Tech Innovation Index 2 takes data from three main sources,
- We look at every website we can find for the 3.2 million UK companies registered with companies house. Using machine learning we classify these companies into industrial sectors that aren’t defined in official statistics.
- We download all the events from Eventbrite and Meetup and see where people and business are meeting and talking about innovation. Using machine learning we classify these events into the same industrial sectors as in part one.
- We search through 140 million scientific papers for topics related to these industrial sectors.
Then we add data on the links between our businesses, events, and papers.
Was an event held at a University? Does a company share directors with another company? Or did it sponsor an event?
Links answer those kinds of questions.
We then put all of our businesses, events, and papers on a map and pull them together into clusters; groups of industry and innovation that work together. We cluseter mostly by geography — nearby people and companies tend to work with each other more. We also consider the strength of links between each thing in our database.
What we don’t do in thie version of the UK Tech Innovation Index is follow obey existing boundaries, cities, and towns when calculating our clusters.
Who else is doing this.
We’ve been doing this kind of thing at The Data City for years. We have long experience in marketing and brand management where these techniques are well-established. We know that our friends in financial services industries have been doing the same for even longer.
Recently some new places have been joining in and we think it’s fantastic. The examples that we know about in the UK are,
- City REDI at the University of Birmingham, who are using website analysis to classify industries in the same way that we do.
- TechNation (formerly Tech City and Tech North), who are using Eventbrite and Meetup data in the same way as we are.
- Nesta and Frontier Economics, who did early work with Meetup data that inspired us.
- The ONS, whose data science campus are exploring alternatives sources of data for experimental national statistics, including website analysis for industrial classification.
- The Bank of England, who recently said that they are doing some of this stuff (though we couldn’t really understand their speach on it).
- BEIS, who have used spatial clustering techniques similar to ours to explore moving beyond fixed geographies for analysing the economy.
But we know that there are many more people doing similar stuff. We’d love to hear from you.
More data than ever before is being used to understand our economy.
The ONS are using PAYE submissions and VAT returns to improve their statistics by massively increasing sample sizes. TechNation and Nesta are using the Business Structure Database to supplement the open data available from companies house. Reports and data sources like TechNation and Creative Nation aggregate and make that closed data available to more people.
This is a really exciting time to be measuring the economy. Radical openness is changing everything.
Old national institutions with control of the single source of data and a fixed list of questions to answer with it will adapt or die. New institutions will compete, answering new questions in new ways, for different governments, communities, businesses, and geographies. International and local will matter as much as national. Independent, trusted, and innovative will matter more than established.
At The Data City we are working to be one of these new institutions. Part of being radically open means sharing what we find, and how we find it. So that we learn from other people, inspire some more, and scare others.
That’s why we’ve written this blog. And that’s why all the data behind the UK Tech Innovation Index 2 is available for free, with an open license. Because we are independent, pioneering, and radically open.