Show Me the Data: Episode 1

SHOW ME THE DATA PODCAST

EPISODE 1

DATA PRIVACY IN 2021: EMBRACING THE PARADOX

CONVERSATION WITH MICHELLE TEIS AND DR TOM VERHELST - 16 minutes 14 seconds

“Data can be either useful or perfectly anonymous but never both” – Ohm, 2009.

This sober conclusion encapsulates a realisation we contemplate regularly as data analysts, scientists, data practitioners and decision makers.

Acknowledging and balancing the tradeoffs between useful and anonymous data and the related ethical implications of accessing, sharing and storing these data is a part of each and every data partnership we form.

Ethical frameworks are living documents and outline ways of working. While there are some standard components, ethical frameworks are applied differently depending on the contexts in which, and by whom they are applied.

How then do we protect peoples’ right to privacy while working transparently to address complex and highly-nuanced socio-economic problems? How do we balance this trade-off between useful and anonymous data? What does working with data ethically actually mean and where do you start?

EPISODE TRANSCRIPT

Rhetta Chappell (host): Hi, and welcome to Show me the data, a podcast where we discuss evidence-based decision making and the ways in which our lives interact with and create data. I’m Rhetta, your host for today, and I’m a data scientist at Griffith University. Show me the data acknowledges the Jagera peoples who are the traditional custodians of the land on which we are recording today, and we pay respect to the elders past, present and emerging. Today, I have the pleasure of speaking with two guests, Michelle Teis, who is a partner at GWI and Dr Tom Verhelst, who is the director of RIDL. Michelle is an enterprise information management strategist and she’s all about maximising the use of data to improve decision making and reduce complexity. Today’s episode is about embracing the paradox of data privacy and ethics in 2021, and both Tom and I are looking forward to this and I’m sure we’re gonna learn a lot. Hello, and welcome Michelle and Tom, really happy to have you both here today at Show me the data. Michelle, it sounds like some of the work that you’re doing at GWI has some interesting parallels and similarities with some of the work that we do at RIDL. And I wanted to ask you specifically about if you could talk about some of the projects that you’re doing in this social good and social impact space?

Michelle Teis: Absolutely. So, GWI is a boutique management consultancy that specialises in only two areas, we deal with digital transformation in the way that people work with data. But we also deal predominantly with information and data. And we help organisations solve problems with data and the information that they collect. And we’re particularly passionate about doing work that matters. So we work in child safety space, we work with homelessness data, we work with the public sector to help them improve and use the data that they’re working with. In fact, we work across all industries and sectors.

Rhetta: That’s really interesting, we kind of do similar things in the sense that we’re trying to use evidence-based solutions to address complex socio-economic problems. And we’re using the place rather than like agency or discipline as the data linking framework, do you find in the work that you’re doing that you’re also kind of taking this place-based approach? Or is it are you looking at data from a different perspective?

Michelle: We do take a place-based approach with some of our work, but not all of it. Some of it is really just about helping organizations maximize the data that they have, and getting them to recognize the value of what they can do with that data.

Rhetta: Yeah, that’s great. You recently wrote that we all need to start applying an ethical framework to decision making and that we really need to be doing this now. Or maybe even yesterday, could you explain what an ethical framework is, and maybe give us an example or some examples of how smaller organizations or government organisations might practically implement one?

Michelle: Absolutely. So ethics is all about doing the right thing, the core tenant of ethics is to do no harm. And that means no harm to individuals, or the reputation of organisations. For everybody, an ethical decision is slightly different. And an ethical framework helps to guide decision making in an organization to ensure that there is consistency across the decisions being made. So it’s a principles based approach normally, and the principles aligned to a company’s organisational values. And it just pushes people to think about are we doing the right thing? Could we do it differently? Are we using data the right way? Are we being transparent? Do we have balance in what we’re doing? Is there proportionality in the amount of data that we’re collecting. So small organizations can easily put in place a framework that they incorporate into all of their data focused projects, so that they’re forcing an ethical lens on all of their decision making.

Dr Tom Verhelst: As the sort of the ethics of a company changes? Because the ethics of a company is sort of based on the ethics and society where it sits in? And that the morals and what’s acceptable sort of changes? How do you see that? It’s sort of like a challenge, right? It’s a bit like, you post something eight years ago on Twitter, which was then sort of acceptable, and now it’s like, you know, you’re a horrible person.

Michelle: Happens all the time. Yes. So look, ethics frameworks are living documents, they need to be reviewed by organizations all the time to ensure that they do align with changing values of the community, but also the way that organizations involved in how their own values change. So you know, what you may think is ethical might be different to what I think is ethical. And so what our ethical framework does, or a data ethics framework is just helped guide consistency and make sure that we’re both thinking about the same things. And hopefully, we will come to a similar conclusion.

Tom: We have recently been working with sectors that usually are aren’t as maturing in in the data journey, partly because they haven’t had the financial resources and the width and gradually sort of the data space is becoming cheaper. So it becomes more accessible for the sort of non-standard sectors, we’ve been trying to set up data collaborative. So data cooperatives in like the community space, the human health service to space. And basically, what we’re trying to do there is, you know, you have these sectors like finance, where they share some data, they make insights across sectors, you have the same in big oil in mining, we’re trying to do the same within the sort of community and services space, it’s way more sensitive, as in, they usually focus on people that are naturally disadvantaged, otherwise, they wouldn’t be focusing on them. So often, it’s sensitive data before, it’s sort of before you can sort of makes any value out of it, it has to be sort of the sensitive data. And so we’re talking about, you know, the cases of people and how they move from one organisation to another, and then how those organisations work. You know, working in those organisations is very challenging. People have higher burnout rates and other organisations, we really still trying to sort of figure out how we should work with that sort of sensitive community data. Do you have any advice on the ethical questions that we should be asking? in those spaces, like when we when we bring these people together to share the information? What Shouldn’t we be missing?

Michelle: So first of all, I’m a big fan of data cooperatives or collectives in that I think the use of shared resources and shared data can lead to common, or consult common interests and bring great benefit to those people that are members of the collective. And ethically, it actually changes the dynamic of what you’re doing with data. So when you bring people together into a cooperative, you have different ranges of stakeholders. And so you need to consider from an ethical perspective, who all of those stakeholders are, and whether the decisions that you’re making or the products that you’re creating are gonna adversely impact any of those stakeholders, whether or not there’s any bias bought in that would impact one particular part of the cooperative above another. You need to look at the transparency of what you’re doing, and making sure that people understand exactly what is happening with data, particularly if it’s personal or sensitive data. So have they given you consent for primary use of or primary collection use? And then are you using it for a secondary purpose? And if so, are you being transparent about that secondary purpose? And does everyone understand what the outcomes of that could potentially be? So there’s lots of things you need to consider how much data you’re collecting whether or not you’ve got more data than you actually need? Have, you got the minimum set so that you’re not sort of over collecting data, and you’re not using it appropriately, looking at third party access to that data, considering you know, who should access it and whether or not they have the right to see the data that you have. So there’s lots of things you need to consider

Rhetta: in most of the work that we do. And this kind of ties into this data collaborative work. Our idea was to we’re taking individualised data, but then we’re aggregating it up to some level that makes it so that you can individually identify different organisations or different people, how do you balance this trade-off between useful and anonymous data? So it’s kind of playing on that idea of like the privacy versus utility trade off? And how do you balance this in the work that you do?

Michelle: it’s one of the biggest challenges that we have today, I think is when you start to anonymise data, you do reduce the utility of that data. And so it’s about understanding the context of what you’re trying to do. And the context in which you’re collecting data, and the context in which your stakeholders operate. And I think you need to find a balance. So the more the more you anonymise data, people believe that you’re actually protecting the personal information of individuals. That’s not always the case, when you are aggregating. It is always possible these days because of the volume and the veracity of data collected, and the advancements in technology to re identify people in that set. And so that’s where your ethics comes into play, you actually need to sit down and put that ethical lens over what you were doing with the data, how you’re planning to use it, whether or not there is potential for harm, whether or not there’s balance and proportionality, whether any biases creeping in, and overall, respecting people’s privacy.

Tom: I think you made like a very interesting point, because before the start of like a conflict arising, there’s the you’re very right, like you should ask yourself, what are you collecting? And are you not over collecting? But then to be able to answer questions around, you know, his drive around, reduce the bias or afterwards asking different questions. It’s sort of creates tension. Because you don’t want to over collect, you don’t want to look at the most sensitive. But then if you abstract or if you have, you’ve scaled down what you’re collecting, you’re sort of putting yourself a bit at risk with your conclusions or your analysis.

Michelle: Yeah, you’re also reducing, you know, potential for, for greater outcomes. But it’s just a matter of finding what is the right balance, and it is a current challenge and the federal government have acknowledged that and are currently reviewing the Privacy Act. One of the considerations in that review is whether or not people should have to give consent for every potential use of their data. So instead of doing general consent to say I, I agreed you’re collecting my data for this use, and I understand that you might use it for X, Y, and Z, but actually getting informed consent for every potential use.

Rhetta: That makes it tricky, I guess, but it makes sense.

Tom: Yeah, it’s very, it’s very interesting. So it leads nicely into our next question where, like, within academia and government, very often, we have very rigorous ethical processes, which are sometimes a frustration for, I mean, they have their purpose, and they’re very valid. While in commercials in the commercial space, sometimes that is not as required. I mean, there’s definitely examples of, I think, see examples in the financial sector in the mobility data sector where, you know, there’s not, the process is not as as rigid what a stick. It makes those type of it makes it quite interesting how sort of academia and government can sort of learn from the commercial or business sector because they look a lot more Lean and Agile, and they seem to be answering questions a lot quicker for you, since you’ve worked in academia A while ago, but now in business, have you observed that too, is that something that you think I can, I can learn from?

Michelle: Absolutely, so had the privilege of working across both academia, the commercial world, and also in the government, non public sector, and all three of them have things that they could learn from each other. So academia has, you know, a long history of collecting research and doing research, and they’ve always known how to manage research data, doesn’t mean they necessarily always do it. But they’ve always known the rules around research data, and, you know, establishing good governance about that, storing it in the right way, having human ethics committees and taking a risk approach to the use of data. Universities have always been great at that. In the commercial sector, they’re very revenue driven. And so they’re trying to do more with data for less because revenue is is the primary focus. So they make their decisions a lot quicker, they process things a lot quicker, they don’t have the detailed level of structure that government has so and on the other side, government’s very compliant, driven. So they’re slow to make decisions. They’re hesitant to share data, because it’s driven by taxpayers money, and they want to make sure that they aren’t doing the right thing by the taxpayers, and therefore, there’s a lot of red tape around their decision making.

Tom: Yeah, we’ve, I’ve really observed some very interesting. The last last year in particular, I’ve seen two departments buying data that basically would standardly be provided by census. But they wanted more accurate, they wanted more contemporary, they’ve, they’ve really paid a constant amount of money to companies to provide them with that sort of information, which I think is very interesting. It’s a very interesting trend. Because you’re right, they are slow in their collecting data, and they’re sharing. So they’re sort of looking at business to provide insights for government to make decisions.

Michelle: But they’re also recognising that data holds a lot of power, and therefore access to the right data can really inform decisions and make for a better public service.

Rhetta: So, Michelle, to finish up, we’re going to ask you a question that we’d like to ask all of our guests. And I’m not sure if it’s better to think of a specific project or just in general, if you could get access to one data set, and I guess, based on our conversation may be ethical and policy concerns aside, that would really unlock some really fruitful insights for you. What would that dataset be?

Michelle: Great question. For me, it would be around consumer behaviour. So, understanding why people do things when they do things and looking at trends and patterns across communities. I think that that would be incredibly informative. It would inform retail patterns, it would inform education patterns, it would inform, you know, job roles and changes in job role. I just think I don’t know if that data set actually exists. But yeah, consumer behaviours are incredibly interesting.

Rhetta: So maybe some banking data, but we almost need every bank’s data like you mean like patterns or….

Michelle: imagine that, banks sharing data. Yeah. Yeah. Yeah. But yeah, consumer patterns, really interesting area and you know, so many different opportunities would come out of them.

Rhetta: Great. Well, thank you so much for your time, Michelle, and you too, Tom. And that’s it. To listen to more episodes of show me the data, head to your favourite podcast provider, or visit our website, RIDL.com.au, and look for the podcast. We hope that by sharing these conversations about data and evidence-based decision making, we can help to inform a more inclusive, ethical and forward-thinking future. Making data matter is what we’re all about. And we’d love to hear why data matters to you. To get in touch. You can tweet us @G_RIDL. Send us an email or if you prefer, just send us a letter by carrier pigeon. Thank you for listening, and that’s it til next time, take care and stay safe.