Our project: The Data City in Lille and Leeds.
Leeds and Lille have been twin cities for 50 years. For even longer, the two cities and their city-regions have had a shared and similar history.
Both were at the heart of the industrial revolution, with huge strengths in the textile and chemical industries. Both were powered by surrounding coal, brought to them by canals and railways. Both have grown and evolved into modern European cities with common strengths in retail, design, food, technology, and civic open data.
With so many similarities there are also many opportunities to work together. By sharing experiences and data about what works in each city, we can achieve more and strengthen our links.
Having visited Lille, and with our experience in Leeds we have chosen to focus our work on understanding links that can be strengthened and strengths that can be shared in three areas,
- Euratechnologies, Lille. One of Europe’s largest startup hubs, it has hosted over 300 startups, and helped them to raise over 170 million €s from over 100 funders. Its list of 191 associated companies is a fantastic data source for understanding how and where tech creates jobs and wealth. The success of this huge old mill and its surrounding tech ecosystem offers lessons for Leeds’ South Bank development which itself contains huge old mills, and aspirations to achieve the same.
- In 2020 Lille will celebrate its designation as World Design Capital — Leeds has the chance to invite many of Lille’s design leader to new events such as Leeds International Festival and Leeds Digital Festival and to share its leadership in Global Service Jam with Lille.
- For Leeds’ 2023 Capital of Culture plans there is much to learn from Lille’s 2004 experience.
Notre projet: The Data City à Lille et Leeds.
Lille et Leeds sont des villes jumelles depuis 50 ans. Pour encore plus longtemps, les deux villes et leurs métropoles ont partagées une histoire similaire.
Elles étaient au cœur de la révolution industrielle ; avec des puissances énormes dans les industries textiles et chimiques. Elles étaient alimentés par le charbon, apporté de leur régions par les canaux et les chemins de fer.
Aujour d’hui elles sont des villes européennes modernes avec des puissances économiques communes dans le commerce de détail, le design, la nourriture, la technologie, et les données ouvertes civiques.
Avec autant de similarités, il y a beaucoup de possibilités de travailler ensemble. En partageant des expériences et des données sur les grands succès de chaque ville, nous proposons de les comprendre mieux et renforcer nos liens.
Après avoir visité Lille, et avec notre expérience à Leeds, nous proposons de concentrer notre travail sur la compréhension des liens qui peuvent être renforcés et des puissances qui peuvent être partagées dans trois domaines,
- Euratechnologies, Lille. L’un des plus grands centres de démarrage en Europe, un écosystème qui a aidé plus de 300 startups à lever plus de 170 millions d’euros de fonds en travaillant avec plus de 100 investisseurs. La liste de 191 entreprises associées avec Euratechnologies est une source de données fantastique pour comprendre comment, et où, la technologie crée des emplois et de la richesse. Le succès de cet énorme ancien usine et du écosystème technologique dans ses environs offre des leçons pour le développement de South Bank Leeds, qui contient elle-même d’énormes usines anciens et des aspirations à atteindre des objectifs similaire.
- En 2020 Lille célébrera sa désignation de Capitale mondiale du design — on propose que Leeds devrait inviter de nombreux designers de Lille aux événements tels que Leeds International Festival et Leeds Digital Festival et que Leeds devrais partager ses expériences de Global Service Jam avec Lille.
- En préparation pour Leeds 2023 — un équivalent au Capitale de la culture européenne — Leeds a beaucoup à apprendre de l’expérience de Lille en 2004.
Euratechnologies: uniting Lille’s tech ecosystem?
In February 2018 we visited Euratechnologies, Lille. It’s just over 3.5km West of the city’s international train stations, or 15 minutes on the frequent metro, plus a short walk.
Two decades ago the huge old textile mill laid in ruins. The spinning business that it housed had declined from 3000 employees at its peak to just a few hundred when it finally closed in the late 80s.
Today, after over €30m of investment, the building is pristine, with a huge bright atrium for events between enormous office spaces for work. It bustles with technology businesses of all sizes. There are dozens of new start-ups, over a hundred companies that have passed through Euratechnologies’ mentorship, incubation, and scale-up programs. There are global companies like Microsoft and Capgemini, and offices of French national institutes for energy, information technology, and aerospace.
In 2017 the site reached a key target and an obvious passion of its founders; to host more jobs than it did in the days of heavy industry. Today 3500 people work in technology companies on site, or in the new office buildings that are springing up around it. At least one event a week attracts people from all over Lille and its region.
This once run-down district of Lille is today coming to life and cranes are everywhere. They are building offices, flats, restaurants, and bars. The observation from 2011 in an LSE study that the area had a “clear lack of any restaurants, shops, or support businesses in what seemed a remote neighbourhood” is no longer true.
This success should prompt important questions for Leeds,
- Is Euratechnologies a success that Leeds should emulate?
- Has it created extra tech activity in the city or merely concentrated activity which existed already?
- Is its location at a distance from the city-centre an advantage or a disadvantage?
- Could it have succeeded without the metro?
At The Data City, we wanted to find out whether our tools and techniques could help to answer these questions.
Tech-related employment in Lille.
We started by looking at the location of technology-related businesses in both cities.
Looking first at the data for Lille, we put every technology-related business in the Lille Metropole on a map, with the size of each marker corresponding to the number of people employed at each site.
What’s clear is that Euratechnologies has already created a substantial hub of tech employment that is already visible in the data.
Creating as good a map for Leeds is not possible using open data in the UK. The French companies open data set — Sirène — contains employment information, both for the company at each location, and nationally where it has more than one site. But to get similar data for the UK you need to access the UK Business Structure Database, available only at secure locations and subject to onerous access requirements and clearance.
Some good work has been done using this dataset. For example, Centre for Cities’ analysis of the growth of TV, Radio, and Film companies at Salford Quays since its redevelopment as Media City UK and Nesta’s Geography of Creativity work. But the data is not available in a useful or affordable form for us.
This means that for now the best that we can do for Leeds is show where tech-related businesses are, with no reference to how many people they employ. Despite this restriction, we can see an important difference between Leeds and Lille.
Leeds has made efforts and had some success in encouraging businesses to return to Holbeck and Leeds Dock, similar deindustrialised areas to those where Euratechnologies is. But the data suggests that Leeds’ efforts to grow a tech and creative business hub south of its river have not matched Lille’s success with Euratechnologies.
With its forthcoming South Bank development, Leeds will extend its city centre and provide more space for growth, as Euratechnologies has done in Lille.
Progress during this project and recommendations for the UK
We have made progress in two main areas during this project, one which we have shared in detail, and one which we have not.
I have just described the use of the Sirène open dataset and the UK Companies House open dataset to compare technology clusters in Lille and Leeds. Both require an open address database in order to map; specifically The BAN in France and Code Point Open in the UK. There are excellent guides to geocoding the Sirène dataset online.
There are lessons for the UK in the greater power of the Sirène dataset, specifically the information it contains on company size.
One area where we have made more progress in the UK is the classification of businesses into niches of technology. In the work we have done on this project we have used SIC codes to classify industries, even though we know it to be a poor method.
During this process we have made significant progress at moving beyond this and extending The Data City approach to industry classification. Euratechnologies has a manually-curated list of 191 companies that have passed through their start-up and incubation schemes, or who rent space in their offices on its website. This is a closed dataset which we have scraped. We will not be sharing it without permission from Euratechnologies.
This dataset has allowed us to prototype a method for extending our industry classification techniques to all French companies. This is exactly analogous to the training set of manually classified companies that we used in the creation of the IoT UK Nation open dataset.
Our method for classifying companies into industrial sectors is described in our “industrial sector classification using machine learning” blog post from January. Key to the method is having a manually classified list of companies, ideally available under an open or permissive license.
Because our current prototype classification of French companies by industrial sector is based on a sample of companies in Lille, it is biased towards the definitions used by Euratechnologies. Because it is based on Euratechnologies’ data, it has not produced output that we can share.
We continue to look for more sources of manually-classified companies data to broaden our biases, and thus reduce their impact. We continue to look for an open list of manually-classified companies in France so that we can use our results more widely, and share them back so that other people can use them.
- Sirène (French companies) open dataset
- UK Companies open dataset
- BAN (French addresses) open dataset
- Codepoint Open (UK addresses) open dataset
- Euratechnologies companies list (191 manually-classified tech companies in Lille). Closed dataset.
- IoT UK Nation open dataset (only used to extend the method to France).
The Data City process analyses scientific publication as a way of seeing where, and in what fields, innovation is taking place. We use Microsoft Academic Knowledge as our primary source for this data.
We find about 1 million papers published with an author in the UK, and about 400 thousand papers published with an author in France. This is fewer French papers than we’d expect, but the ratio isn’t too far off expectations; other publication tracking that services show that UK science is extremely productive at publishing papers, with 183 thousand published in 2016 compared to 113 thousand in France.
In this project we looked at two things that we haven’t explored in detail before: co-publication, and specialisation within France. Specifically we asked,
- Where do universities and industry in Lille co-publish with?
- What fields of study are universities and industry in Lille strongest in?
We were able to make a simplification early on because all of the papers we could find from institutions in the Nord department, of which Lille is the capital, were also in the Lille Metropole.
Where do Lille’s researchers co-publish with?
Researchers in Lille co-publish most frequently with researchers in the USA, the UK, and Canada — the same top three as for the whole of France.
Greater co-publishing in Lille with Belgium is likely to be because it is so close. Similarly, lower co-publishing with Spain and Germany is probably because they are further away than for many other regions of France.
We looked more closely at collaboration between Lille and the UK, to see whether Leeds and Lille’s twinning and shared industrial heritage has any impact.
The results are pretty clear, there is no obvious legacy of twinning or shared industrial histroy in academic co-publishing between Leeds and Lille.
So how do places end up collaborating on research? We decided to look a bit more closely, by investigating what Lille is good at researching.
What do Lille’s researchers work and collaborate on?
We looked at the top 12 fields for scientific publication in The World, in France, and in Lille.
Lille’s strengths are heavily skewed towards computational and mathematical methods, probably reflecting the presence of INRIA (the French National Institute for Research in Computer Science and Automation) in the city.
It is likely that this specialisation is what leads to stronger links with universities in the UK with the strongest computer science departments, such as Cambridge, Imperial College, and Manchester.
In conclusion, we are able to use our techniques to pick out Lille’s comparative advantages in scientific research. This then clearly plays a role in the collaboration that researchers in Lille have internationally. We don’t see any impact of Leeds and Lille’s twinning arrangements and shared industrial history on their scientific collaboration today.
But there is one area of research that we have not yet explored enough.
The UK publishes over 50% more scientific papers per year than France, but France publishes about 30% more patents per year than the UK. (369/million population vs 290/million population in 2016, Figure A42.
It might be that Lille and Leeds share strengths in R&D that aren’t captured in scientific papers but that are captured in patents.
We have not yet been able to investigate this. In order to do so, we would need full access to the European Patent Office (EPO) dataset. That is something that we are working on.
- Microsoft Academic Knowledge (MAK) (paid, not open).
- Google geocoder, to geolocate all institutions in the MAK dataset (paid, not open).
- Mapzen’s open country outline shapefiles (derived from Open Street Map) to locate institutions to countries (open).
- A shapefile of all of France’s departements (derived from Open Street Map) to locate institutions to departements in France (open).
- The European Patent Office raw patent data, which we have not used in this research but intend to in the future (paid – but not much, closed – but generous terms, hard to access).
- NUTS (0,1,2, and 3) boundaries from Eurostat.
Measuring Tech ecosystems through meetings
Meetings and events are a widely-used proxy for collaboration when trying to measure innovation ecosystems. In our early work at The Data City we focused on extending methods used in early Tech Nation reports and by Nesta. Specifically, we wanted to draw data from more events services.
By collecting more data we were able to look not just at the total number of events, but also the themes of those events. With this method we found evidence of small clusters that, through being highly-specialised, reached the densities that we think boost innovation.
Our work in Lille has given us the resources to expand our approach to measuring events beyond the UK. The results have been interesting, and challenging.
Our scripts currently collect every tech-related Eventbrite event in the world every week. At the frequency we also collect every Meetup event in the UK, Ireland, Spain, France, and Portugal. In this report we focus on France and how it compares to the UK.
Overall, the UK has more events, 854 in the UK compared to 518 in France. This doesn’t feel quite right, but the numbers are similar enough not to ring any alarm bells. Where we start to see big differences is when we look at cities.
There are almost no tech events in Lille. Since we are looking specifically at Lille, this result was very surprising. Where are all the events in Lille?
So we looked at Eventbrite events instead, hoping that Lille’s tech scene might appear. But the problem just got bigger.
In Eventbrite, no French city except Paris makes the top ten cities for activites across the UK and France. Looking at events by country explained the problem; Eventbrite is much more popular in the English-speaking world than outside of it.
Where are the events?
One the reasons that we started The Data City was that we felt that Leeds and other cities in North England were being poorly represented in national assessments of tech innovation. Specifically, by using only Meetup events to judge a city’s tech meetings we were losing out.
By adding Eventbrite events, we improved the scoring system. For us.
But we soon heard from Scotland that we should add Open Tech Calendar. And once we did, Scotland was better represented too.
Going into this project we assumed that we would find something similar in France. We hoped that we would find a popular events service that we didn’t know about that we could add to The Data City.
We’d already done some research. With nearly 1 million members in the French-speaking world, OnVaSortir was exactly what we were hoping to find, it even had some events of the type we were looking for.
Sadly the site didn’t live up to our hopes. First of all, it is focused on socialising, and when we asked people in France about it they said that they almost never used it for tech meetup and events. Second, without an API, or any obvious method for downloading events in bulk, there was no way to get the data that we needed to analyse.
So where are the events in Lille?
It would be tempting to think that maybe Lille just doesn’t have many tech events. After all, other French cities have plenty of tech events in Meetup, but Lille doesn’t.
The problem with this theory is that it was easy for us to find at least one good tech or innovation event to attend in Lille for every night of the week that we were there. The data that Meetup and Eventbrite were giving us was clearly not a good reflection of what was very obviously happening in the city.
It turns out that many, perhaps most, tech events in Lille are organised outside of any centralised platform like Meetup or Eventbrite. A good example is to look at Euratechnologies’ event list. Most events have custom sign-up pages.
This causes us a problem. We could pretty easily scrape Euratechnologies’ events calendar, but that only solves the problem for one venue, in one city, in France.
There is no easy fix to this problem and we have not found a comparable way to get tech events data for the UK and France. Through this project we have developed methods to compare the UK, English-speaking Canada, and the USA, but these methods don’t work well for France, or at all for Lille.
Culture and measurement
Without an easy solution, it’s important to understand, accept, and share the limitations of our technique. When it comes to measuring innovation our methods at The Data City use events data. Where that events data isn’t available, our methods are less good.
This leas us to a more philosophical point. All of us ask questions based on our situation — at The Data City we are interested in events data because we go to lots of events, and because we know that the data is available. We all live in the UK. But if we lived in France we would probably have considered very different proxies for measuring the amount of collaboration in the tech ecosystem.
Culture is important in all studies, and using lots of data doesn’t change that.
My favourite way of showing this is to look at Eventbrite tech events in Belgium.
To a first approximation the Southern half of the country is French speaking, like France. It contains large cities like Charleroi and Liège. About a third of Belgium’s population live there.
And yet less than 3% of tech events on Eventbrite in Belgium take place in Wallonia, the French-speaking region, which borders with Lille.
It is extremely unlikely that French-speakers do tech and meet to talk about tech massively less than Dutch speakers in the North.
What we are seeing is almost certainly a cultural difference in how tech events are shared and advertised. It is probably a similar difference to that between the UK and France which means that we were unable to measure the ecosystem of tech events in Lille using the methods we developed in the UK.
- Eventbrite data (available via a good API, closed data, restricted usage).
- Meetup data (available via a good API, closed data, restricted usage).
- OnVaSortir data (unavailable and not very relevant).