RIDL

SHOW ME THE DATA PODCAST

SEASON 3: EPISODE 5

Data Skills That Stay: NICD's Approach to Capability Building

CONVERSATION WITH DR HOLLIE JOHNSON AND DR TOM VERHELST - 24 minutes 09 seconds

Managing vast amounts of data can be quite overwhelming, regardless of whether your organisation operates within the public sector, functions as a small enterprise, or is a large corporation.

Unlike typical consulting firms, The National Innovation Centre for Data (NICD) equips teams with the necessary skills to independently manage future data projects, enabling them to maintain the expertise internally and ensuring they have the insights required to keep the capability in-house. In this episode, we’re joined by a former Senior Data Scientist, Dr Hollie Johnson from NICD in Newcastle, UK, who shares their experience working within NICD. Hollie shares their experience leading a unique data project with a start up who specialises in technology systems for grid decarbonisation using second life vehicle batteries. Our guest also discusses the transformative journey from manual spreadsheets to automated data collection and analysis, demonstrating how NICD is democratising data. They delve into what it means to be data-driven, the value of collaborating with experts in understanding moral and ethical grey areas, and the vision for a similar initiative in Australia focused on upskilling and bridging the gap between data beginners and experts. The conversation also covers clean, sustainable data science practices, advice for aspiring data professionals, staying current with trends, potential data gaps, and the dream of having access to transport data. This episode is packed with insights for anyone interested in the evolving landscape of data science and its impact.
EPISODE TRANSCRIPT

Rhetta Chappell (host): Hi and welcome to Show Me the Data, a podcast where we discuss the many ways in which our lives and the decisions we make are impacted and depend on data. I’m Rhetta your host for today, and I’m a Data Scientist & Partnerships Lead at Griffith University. Today, Tom and I have the distinct privilege of hosting our international guest, Dr. Hollie Johnson, from the renowned National Innovation Center for Data, or NICD. NICD is at Newcastle University, which is based in the UK. And what you’ll soon notice about them is that they have a very unique value proposition and service offering. As a data scientist, Holly has a rich background in maths statistics and software development and we can’t wait to share a conversation with you.

Show Me the Data acknowledges the Jagera peoples who are the traditional custodians of the land on which we are recording today. And we pay respect to the elder’s past, present and emerging.

Thank you so much for being here today, Hollie. We’re going to start with what is the National Innovation Center for Data and how do you guys help organisations?

Dr Hollie Johnson: The National Innovation Center for Data was set up about seven years ago, primarily to address the kind of skills gap that exists in data science and kind of related fields and largely focusing in the northeast of England. The idea is that we would work with organisations side by side on project based kind of engagements to help people from those organisations upskill in various kind of skills around data science and to take those forwards into the organisation and continue to do more impactful data science work beyond the scope of our engagement.

Dr Tom Verhelst: So to give our listeners a deeper sense of the sorts of projects and client work with at NICD, could you describe the impact of your favourite data project in function of learning and practical takeaways?

Hollie: I guess I’m going to talk about the one that I’m kind of most familiar with the most recently. I feel like I have the most impact, and this was with a company that manages second life electric vehicle batteries. So they are small organisation based in the northeast of England they’re growing rapidly so that they’re planning to scale significantly over the coming two years. But at the moment, their data team is about four people. When we started working with them, their data team was one guy. He was working on his own. He was put into that role because he was the one with a maths background. So see, that’s kind of the most, most appropriate person. And they came to us to really help them do a little bit more with the data that they had. So they had a lot of Excel spreadsheets of information about the state of these second life vehicle batteries, and they wanted to use that more effectively, both for themselves and for their clients, but also because safety is obviously the number one priority in that kind of system. Understanding what’s going on allows them to put in appropriate safety measures. So the first interaction we had with them was helping the client understand how they could pull in this data from Excel spreadsheets into a kind of automated cloud system that would clean process and collate all of that data and put it somewhere central that they could use it. So this was this was really great, right? This meant they had somewhere central that they could take data from and use it for whatever they wanted, but they didn’t really have the skills then to go ahead and do anything with this. The second stage of the project was to help them build a integrated dashboard that they could use internally for safety, but they could also offer it to their clients, and their clients could then use that to pull down kind of reports of different periods of time, different battery systems, etc.. And through this, they kind of gained the skills in the dashboard development and the dashboard deployment on top of the kind of cloud infrastructure skills that developed in the previous phase. So that was really great for them and that meant they could offer something of higher value to their clients, as well as improving their internal sort of efficiencies as well. The third stage then was a bit more sophisticated. They wanted to know, well, can we understand if something unexpected happens where the battery system and so we help them build a LSTM based neural net model to detect anomalies in these battery systems, looking at a whole range of variables. And this introduced them to coding and programming in Python into deep learning and deployment of models and monitoring of models. And this is the point at which that team started to grow.

It’s been really exciting for me because I joined the projects towards that later phase. So taking over for the machine, learning deep learning side of it, that’s my background and it’s been really nice for us as an organisation to, to work on because it’s really shown how we can pull in different people from our team at the right time to deliver where we have the best expertise. And it’s been nice to see their team also grow in that same way. And they’ve brought on new junior staff who have also developed skills through working with us. And it’s just great to see that they’re really becoming a much more data driven organisation as a result of this engagement. It’s not something they’ve done once and kind of put aside. It’s become really fundamental to the way they work. And yeah, I think they’re really kind of a true north east England success story. 

Rhetta: And I think pulling on something that you said there, they’re really now becoming a data driven organisation. We work with a lot of organisations. We talk to a lot of different people. Do you have a definition like what does it mean to be a data driven organisation?

Hollie: I think there’s different ways in which you can be data driven, and I’m talking more about non-tech organisations here. So whether that’s engineering, manufacturing, health care, all those things that you may think typically aren’t sort of data focused. I think data driven can mean that you’re making business decisions based on the data you’ve got in an informative way. But it could also mean that you’re innovating around data, and that’s kind of more the area in which we try and focus, and that’s helping organisations think about the data that they collect day to day through their normal business as usual how they can use that to create potentially new products and services that are driven by data. So in the previous example, that is exactly what they were doing. They were creating a new product and service that they could offer to their clients based on the data they were already collecting.

Rhetta: I think, like everybody talks about wanting to be data driven, but then you’re like, okay, well, what does that actually mean and what does it mean for you? 

Tom: I think it means very different things indifferent sectors, because if you look at a lot industriesthey’re very data driven in sort of trying to sell something or trying to make sure that you keep going to them or trying to make sure they’re happy about your products. But then if you look at the back end of what they’re doing or if you look at that, compliance is completely not data driven. So even within an industry or within a company, you can have places where the organisation is extremely data driven, even leading within the industry. And then if you look at other parts of organisation, they’re just still using paper. Yeah. So that’s yeah, I think that’s where the challenge is it’s not like electricity, it’s everyone uses electricity, otherwise nothing works I’m just thinking about, for example, financial institutions. A lot of the data driven component sits at the front end. You know that to getting customers in assessing their risk, trying to sell them things, trying to sort of show, you know, how your consumer spending is going. There’s all sorts of things going on, but then there’s a lot of things that are still completely not data driven.

Hollie: And that’s really good point there. And, you know, we talk to a lot of organisations, particularly small organisations, who are collecting data. I think where they can benefit the most is to have that in a centralised place where everybody across the organisation can access it and use it if they want to. It’s not restricted to a data team or marketing team or a finance team. And it’s that democratisation of data that allows everybody in the organisation to think about the role that doing from a perspective that’s informed by data rather than being constrained to specific departments. And that’s maybe one of the steps forwards, which is obviously easier in a small organisation to start with. And it’s very difficult if you’re in a large organisation that hasn’t done that together. I think maybe that is the first step that organisations can take in becoming more data driven.

 

Tom: You’re right. I mean, and it’s also there’s resource constraint, right? I mean, the whole use of data is maturing. But you look ten years ago of all of these roles, even, you would have to pay quite a lot of money for a good data scientist. And then they would have a very specific set of domain expertise. So if you’re paying quite a lot of money and you have to set up systems that are quite expensive, you want to basically put them to use in a place where you’re going to have most value for your company. And that’s not always immediately obvious. So it means that sometimes it goes in places where it’s the low hanging fruit, while a lot of the things that are more complicated provide more value but take a lot more time. So those central functions can help by sort of branching out and going to talk to individual groups and parts of the company and then uplift, you know, the capability slowly. And as more and more people get to know data science degrees and get experience in it, I think that skill shortage dissipates and there’s also if we just don’t become a data scientist to have sort of expertise in a specific, you know, in compliance sort of what’s in sales and then you sort of have capability growing within the subsections of an industry, if you look at anyone that has a leadership position or like a senior data scientist position, very rarely have they a data science degree. Because didn’t exist like you need that need some time before your leaders. It just didn’t exist. 

So, where were we? Yes. Bring it back to data science projects. What data science project are you most excited about? Either your own or someone else’s, like the Open AI or Meta.

Hollie: Yeah. So I’m going to be really maybe a bit boring here and say that some of the stuff we’re doing internally, I think I’m most excited about that. There’s so much cool stuff out there and there’s so much hype and I’m wary about becoming too excited about things that I don’t fully understand. I think the things that are really exciting to me, and particularly coming from the perspective of the people I work with in the organisations that I spend time with, the internal work we’re doing at NICD around, you know, creating ways for people to better engage with data science. That’s what’s exciting to me. I think, we have this idea that data scientists do data science. And yes, they do. Hopefully they do that. But actually there’s a huge number of other people that can do data science and have a lot of the skills that are needed, maybe not the really niche technical skills, but have a lot of those skills that we be calling a data science building kind of processes and mechanisms that empower those people to use data science more effectively in whatever job role that they’re in. That’s what’s really interesting to me, and I think that’s something that we’re kind of starting to work on at NICD. And it is a process, right? It’s taking time. But I think there’s a huge amount of potential impact, you know, down the line from this. 

Tom: It’s a bit like learning how to use a computer. 

Hollie: For sure.

 

Tom: Yeah, I mean, you’re right. It leads onto a I was saying before where it’s like the domain expertise. You don’t have to know how to code, but if you see something and you know how that could be optimised, that is very valuable. Absolutely. Yeah. And that’s really missing because if you’re just a data scientist that knows how to create algorithms, that has no idea of the domain, then you don’t know where to start. Then someone has to explain to you what problem they want to optimise. Yeah. And that’s yeah, it’s easy for Netflix when you’re trying to pick a new movie. Well, it’s easy because it’s obvious, but it’s a lot harder industry. And I would argue that most organisations, probably without any caveats, don’t need deep learning. They don’t need recommender systems that don’t need large language models. They need the basic skills in those things that allow them to do what they’re trying to do more effectively or more efficiently.

And sure, it’s cool if you can get that stuff in and it can help you innovate and that is really great. But for the majority of organisations, that is just hype. And I think it creates a fear around doing stuff that actually matters and can make a difference. And it’s just trying to find that line between, exciting, forward thinking innovation and actually taking that first step forwards.

Rhetta: And building on that When you’re working with these organisations and I imagine they come to you with different levels of capability and data skills and things like that. How do you work with your clients when it comes to addressing these kind of adoptions of new technologies and things? We might not fully understand yet and kind of address those ethical or moral gray areas? Yeah, So it’s a really interesting question because I think, you know, we see everything from people who are learning how to read in a dataset into some coding language to do analysis to people that are looking to understand how to assure large deep neural networks for medical applications, for instance.

And, you know, we are as a whole a technical team. So we we’re interested in ethics, we’re interested in some of the more moral side of how you apply these things, but actually fundamentally we’re technical people. And so we defer to the people with the domain knowledge to inform us about the sort of constraints in which we’re working and the important considerations in how things are applied when it comes to the more sophisticated models such as, you know, it’s large language models or generative AI, I think it has to be a collaborative process. So in nobody, arguably nobody is an expert at this and nobody fully understand some of the implications of how these models can be used or applied. And I think making sure that we allow time to have really difficult and thoughtful conversations both with our team and the people we’re working with and a variety of stakeholders. So not only the sort of more senior stakeholders, but people who will be impacted by anything that we implement. That’s what’s really important, an understanding that nobody has a correct answer and it has to be a process, it has to be a conversation, and we have to be prepared to change our mind and not get stuck in a kind of dogma around what’s good or what’s not. But yeah, it is difficult for sure.

Tom: So NICD was really designed was really put in place to sort of help initially just the small, medium, companies around Newcastle. Right. And then slowly sort of spreading out to all over the north of England. And this is because a lot of the data science expertise initially went to London and sort of building that out across the country to make the rest of the UK more competitive. It’s a very interesting approach, especially when you talked about how you engage with companies and you’re not doing it for them, but you sort of teach them how to do it and then they own a lot of the intellectual property and they own basically the process and the knowhow. We sort of have a country where there’s a few very, very large companies that are sort of global companies like or mining companies, and then there’s a whole bunch of small and medium companies where the same sort of challenges arise, where there’s a lot of potential for data to make a difference. But there’s nothing like that here. There’s companies, but, you know, like anyone knows, if you go to a company, a lot of times you don’t own the IP or they sell you something because they really want to sell you what they have. Do you think we should do that in Australia?

Hollie: Yeah, I mean, I’m certainly a massive advocate for as a model. I think, you know, I’m not as familiar with the business landscape in Australia. And I think, we’ve worked with large organisations, large companies and we’ve worked with non-profits move to public sector. But actually where I think we have the most impact is with that not so small, medium enterprise kinds of organisations. And a lot of the time they’re trying to provide utility that supports those larger organisations. So it’s kind of a win-win situation for both of those. You know, if you have more effective SMEs offering more competitive, more valuable products, that obviously positively impacts the large organisations as well.

I guess the question is whether those organisations are struggling to recruit based on, you know, money, based on availability of skills. You know, I think in the UK we’re moving into a place where actually there’s a lot of people who are now qualified in data science through a master’s or actually an undergrad degree. And part of the challenges that organisations are wanting to employ people with a large amount of experience. And so we’ve got the skills, but we haven’t got the experience. And so again, we’re finding ourselves bridging that gap a little bit in helping people who are new into data science, maybe through a master’s or through a bachelor’s to develop kind of on the job skills on projects. I suppose it depends where you are in that in that journey in this region. yeah, I think, it’s worth doing. I think it’s, it’s a really nice way of building a culture of data in a kind of business network as well. If more people are doing data and it’s not just constrained to those more tech focused organisations.

Tom: What is sustainable data science or clean AI? And do these frameworks have impact on your work at NICD?

Hollie: I’m going to answer this in the way that I would think about it. Large data driven AI machine learning, it requires huge amounts of computational power and it’s a very extractive technology, right? So we need to think is what we’re doing worth it? And, you know, we’re living in a very challenging time where climate change is at the forefront of, I hope, most people’s minds. And it impacts a lot of what we do and it should start to impact that even more. So when we’re using technologies that inherently have an impact on the planet and the people who live in that planet, we need to think about whether we’re doing that in a sustainable way. You know that there’s this big things we can do and the small things we can do. But I think again, thinking about smaller organisations who are maybe implementing, you know, a kind of fairly sort of straightforward AI system or machine learning system, whatever that might be, things like thinking about where your cloud resources are sitting. Okay, So some countries, your cloud resources drawing on coal energy, some countries your cloud resources drawing on green energy. And that’s a really simple choice you can make. Whether you think that really makes a big difference, you know, isn’t that the conversation? But there are small things you can do to try and improve your impact, other things like just not using a model any bigger than you have to. So some of these large generative models are using absolutely mind bogglingly insane amounts of data. Right? But that’s just how it is. And there is research and trying to cut this down. So rather than thinking how can we continue to make this model bigger and better and bigger and better, we’re thinking, how can we keep the performance the same, but actually reducing the amount of data required or reducing the amount of computational power required and starting to bring things in that direction instead of constantly thinking about growth, that is obviously something doesn’t apply hugely to it to a small organisation.

But again, having that in your mind when you’re choosing models to use are thinking about what is this sort of the minimum we need to be to be effective, that can really help you be a bit more sustainable. I guess the other element and it’s maybe not so much sort of sustainability, but just thinking about the way that the work you’re doing, the data science you’re doing is impacting different groups of people, ultimately to be sustainable has to be, to be fair in my mind. And that’s a really important part of it. And that ties really closely, I think, into the the sort of the green approach. So what is sustainable for the planet is going to be more sustainable for people in general, but I think it’s difficult when you’re in a situation of being under pressure to drive growth in an organisation. And obviously there’s this there’s money at stake to think about the impact that your decisions have in other areas as well.

Rhetta: And I imagine that people come to you because they’ve got a need. They need help with their data. They need help with their kind of their skills. So then is this even something that they’re thinking about when they come to you? And I’m not saying they should or they shouldn’t, but I think it’s quite an interesting thing because we work with a lot of organisations and that’s generally not a conversation that comes up unless it’s very specific. And they’re working kind of in that sustainable climate change space. 

Hollie: Yeah, I think people do care. I would suggest that in my experience it’s more that people don’t realise that there’s an option there. So I think this idea that you can do data science in a more sustainable way maybe isn’t even in people’s minds. It’s not that it’s not a priority, but you need to know what your choices are. So I do think that’s something that our own sort of education both around, you know, that the stuff we’re talking about every day around ethics and morals and how we can do responsible AI. But also how we could do more, more green and more sustainable, AI

Rhetta: And I guess kind of shifting gears now, I’m kind of curious to ask you some questions about you and your experience. There is a famous article that came out ages ago that kind of rated data science is like the sexiest job you could have. What do you love about being a data scientist and working in this space? 

Hollie: I love the fact that I’m not a domain specialist in anything and I get to experience so many different kinds of domains. I appreciate that’s not true for a lot of data scientists who are maybe sitting in a particular industry. But the nature of the whole idea is that I get to work with, you know, all sorts of things, you know, health care, energy, transport, defence, electrification, all of these different things. And you get a glimpse into all these different worlds.

And what’s really nice about doing data science and always different worlds is you realise that a lot of the problems faced by one sector are actually really very similar to the problems faced by the other sector, but they don’t talk to each other. And so you’re in this unique position of getting to help different sectors learn from each other in a very kind of indirect way. For me, that’s really cool. It’s really fun. I love chatting to people about things that they’re experts in, and I get to do that every day.

Rhetta: I guess, kind of following on from that, our teams, quite young and quite diverse, may also work with a lot of student interns. Is there any sort of advice you would give to people looking in to getting into this space or? I think for people trying to get into this space and particularly getting into data science, I would say that your skills that aren’t data related are so valuable and don’t let go of those in trying too hard to be a data person, I think you find that as soon as you start working with other organisations and doing more interesting projects. Actually, those skills you’ve developed, whether it’s through studying something completely different or through work experience or through your personal background, those things have a lot of value and it’s quite easy to think, it’s not important. I better study all the technical things and sure, that’s good too. But those things in combination and what really make a good data scientist in my mind. 

Tom: Because it is such a fast-moving space where constantly there’s new algorithms. A year ago we didn’t talk about the large language models. Now everyone’s talking about it. How do you stay on top of these trends?

 

Hollie: With difficulty. I mean, the things we do at NICD, we’re quite lucky in that we reserved time on a on a Friday afternoon and we sit and we’ve done things like what we call an internal hack where somebody would bring a new technology or a technique or tool or something they’ve learned or maybe a problem they’ve got and they’ll bring it to the room, will sit and sort of play with it together and try write some code, try and fix something, which gives us the opportunity to learn from each other. We’ve also done things like a paper reading group where we take turns to propose a paper and it might be a kind of a classic paper on something that we talk about all the time, but it might be something that’s just come out. And rather than everybody having to read academic papers all the time because we’re not academics, you know, it’s a lot of work. It means one person can put some time into that feedback to rush through, and it just allows everybody just a little bit more kind of on top of what we’re doing.

I guess the other side is that, you know, we’re not a huge team and we were all quite, again, fairly young. We’re quite engaged and we just share things informally. And I think there’s an excitement about new things coming out that allows us to sort of share and learn as we as we go.

Rhetta: What I want to ask is a question we ask to all of our guests, but again, given that you work in this sort of interesting space like we do, where you’re not necessarily in one area, you’re one expert and you want one set of data to answer one particular question. Given your range of projects that you’ve worked on and kind of the range of data that you’ve worked with, and maybe this even goes into your PhD prior to when you were at NICD, is there a particular data set or type of modelling that you’d like to work with and haven’t yet or really enjoyed working with and would like to share more with other organisations?

Hollie: you know, in the last couple of years I’ve worked with a few organisations peripheral to the transport space. And you know, I would love to see a world where I could play with a transport database for the for the Northeast that combines, you know, your busses, your ferry, your metro, your trains, pedestrian cycle routes, all of these things, because at the moment it’s in practice, it’s so fragmented. I’m not sure what data exists, I’m not sure who owns it. And I think the amount of potential that could come from the ability to combine these and work in this space would be massive, even if it means I can get to work more quickly. Right. Like it would be really, really great. But it’s very kind of a nuanced subject and there’s lots of different attitudes and approaches and different ways of thinking about what makes a good transport system. I think perhaps the challenges that we have in the Northeast is that there’s a lot of different transport companies that are private and nobody wants to share data with anybody else. You know, fair enough. I guess for commercial reasons. But the potential to all of those organisations and companies, if they could share data, if there were some one central, told that to look after it for them, then perhaps that maybe a step forwards which I think is what you guys are doing really well with. But you know, we’ve got some way to go in that. And I hope that if we can combine these datasets, maybe that would help inform that.

 

Rhetta: Well, thank you so much. Hollie was absolutely great chatting with you today. And thank you for your time. 

Hollie: Thank you. 

Rhetta: To listen to more episodes of Show Me the Data, head to your favourite podcast provider or visit our website ridl.com.au and look for the podcast tab. We hope that by sharing these conversations about data informed decision making, we can help to inform a more inclusive, ethical and forward thinking future. Making data matter is what we’re all about, and we’d love to hear why data matters to you. To get in touch, you can tweet us @G_RIDL, send us an email or better yet, follow, subscribe and leave us a five star review. Thank you for listening. And that’s it. Till next time.

MORE EPISODES