Roles to play in the AI dev workflow (Practical AI #93)

All Episodes

This full connected has it all: news, updates on AI/ML tooling, discussions about AI workflow, and learning resources. Chris and Daniel breakdown the various roles to be played in AI development including scoping out a solution, finding AI value, experimentation, and more technical engineering tasks. They also point out some good resources for exploring bias in your data/model and monitoring for fairness.

Changelog++ members support our work, get closer to the metal, and make the ads disappear. Join!

50 minutes
Recorded Jun 17, 2020
Published Jun 22, 2020
Download (49MB)
Transcript
🎧 9,071

Featuring

Chris Benson – Twitter, GitHub, LinkedIn, Website
Daniel Whitenack – Twitter, GitHub, Website

Sponsors

DigitalOcean – DigitalOcean’s developer cloud makes it simple to launch in the cloud and scale up as you grow. They have an intuitive control panel, predictable pricing, team accounts, worldwide availability with a 99.99% uptime SLA, and 24/7/365 world-class support to back that up. Get your $100 credit at do.co/changelog.

The Brave Browser – Browse the web up to 8x faster than Chrome and Safari, block ads and trackers by default, and reward your favorite creators with the built-in Basic Attention Token. Download Brave for free and give tipping a try right here on changelog.com.

Fastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com.

Rollbar – We move fast and fix things because of Rollbar. Resolve errors in minutes. Deploy with confidence. Learn more at rollbar.com/changelog.

Notes & Links

📝 Edit Notes

Streamlit:

GPU acceleration in Windows Subsystem for Linux

Fairness and bias:

Chris’s personal, COVID-related blog post

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

Daniel Whitenack

Welcome to another Fully Connected episode of the Practical AI podcast. This is where Chris and I keep you fully connected with everything that’s happening in the AI community. We’ll take some time to discuss some of the latest AI news, and we’ll dig into a few learning resources to help you level up your machine learning game.

I’m Daniel Whitenack, I’m a data scientist with SIL International, and I’m joined as always by my co-host, Chris Benson, who is a principal AI strategist at Lochkeed Martin. It’s been quite a season in our lives, Chris…

Oh, boy. 2020 has definitely had an impact on my life.

Daniel Whitenack

Yeah, definitely. We would not be right to just ignore everything that’s happening in our world as we enter into these conversations. Of course, we’ve got the unrest that’s really happening in our country, but around the world, as a result of injustices and police brutality, and systematic racism that’s happened in our country, but also around the world… And then that kind of piled on top of Covid virus-related things, and then that piled on top of the economic impact and fall-out of that, and unemployment… Of course, these things are no separate from AI things, and I think probably over the course of these coming years - I think it’ll be years of fall-out from everything that’s happening.

Yeah, totally.

Daniel Whitenack

It’ll impact our conversations.

It will. It’s all real life, and… A couple of thoughts there. You talked about the injustice of what’s happening, and Black Lives Matter being able to come back out and be meaningful in this discussion, which I think is fantastic… it’s a time of change right now, it’s a time of massive shift, and I know it impacts everybody in the audience. I know for me - you mentioned Covid - and you and I had talked a little bit about it before the show, so I’ll share very briefly with the audience what’s happened recently to me. I’m actually choking up a little bit…

So my mother-in-law recently died of Covid, so it’s impacted my family, and I just wanted to share that with the audience. I’ve been kind of missing in action for a little while. I know you did an episode with Darwin AI recently, and I thank you for doing that… And I know with the unrest we paused the show briefly… But I’ve been kind of out of action.

[04:19] I just wanted to let folks know - I know so many people that hear about Covid in the news, but it hasn’t touched their lives in a direct way… And speaking as someone who has had it touched directly, it is a serious disease, so I just hope everybody will follow the safety guidelines and be aware. When you lose someone that you lose, it changes how you see it. When you have other family members that have it, then you’re working on them; and when you have a whole family in isolation, it makes a difference.

So stay safe, people, I appreciate it, and I just wanted to let you know. It’s real, and it’s touched my life, and thank you for letting me say that. It was important to me.

Daniel Whitenack

Yeah, thank for sharing, Chris. I know it takes a lot to share that as well. I know my thoughts and prayers have been with you and your family… And yeah, I think it’s just another data point to motivate people to, like you say, take things seriously… But also, I think the AI community, a lot of the people that listen to this podcast - there are many meaningful ways that people can contribute, whether it’s on the Covid and virus-related front, whether it’s on the racial injustice side of things, or the economic side of things… Of course, there’s community things that we can all do; being good neighbors, caring for people… But then also being tech people, being AI people - I mean, there are some real intersections with AI technology.

Of course, on the policing front and that side of things we’ve seen increased usage of things like facial recognition and other things that are concerning for other groups; algorithmic decisions that are impacting certain groups… On the virus side of things there’s a whole bunch of AI people that are trying to come up with beneficial applications to help that scenario. Not necessarily all predicting Covid outcomes, but helping people get the right information. We had the episode with the Covid QA group that was working on that, we also had an episode about the Cord-19 dataset as related to Covid… So there’s ways that AI people can contribute, both in terms of data annotation, in terms of coding, in terms of jumping into open source projects… So I think I’d really encourage people, if you’re interested in those things or wanting to know how to contribute to those things or wanting to know how to make your voice heard in terms of good AI ethics-related things, reach out to us on our Slack channel, our Slack team. You can find us at changelog.com/community, or on our LinkedIn page, or on Twitter… We really are wanting to have some discussions around these topics, and point people to good resources… So I’m really hoping that people reach out and find some of those ways to contribute.

I’m so glad that not only did you bring up the practitioner side of being an AI professional or an enthusiast, but also the AI ethics side. As I’ve talked about before, I’m very involved in AI ethics… And so as we talk about injustice, both the technical skills that you have there, and the incredibly deep, rich thinking that we hear from people in this community - you have a voice, and you can shape the future. This is really something that we have a role to play in…

Daniel Whitenack

Definitely.

…so I am asking our listenership to engage. Engage with these issues in real life and bring your expertise and your skills to bear on this.

Daniel Whitenack

[07:58] Yeah, and later on – normally, in these Fully Connected episodes we take some time at the end to share some learning resources… I’ve pulled in a few that I’ve run into over the years as related to bias and fairness in AI. We’ll talk about those later on in the episode, and maybe some places where you can find out about some of those things. But before we get there, we do wanna kind of acknowledge that there are a lot of encouraging and exciting things coming out in the AI community in terms of advancing various efforts and various toolkits. That’s one of the things that we wanted to do in this episode, was highlight a couple of those.

The first of those that I saw, which really excited me, was the announcement from Streamlit, that they finished the series A of funding for actually 21 million dollars, which is kind of crazy. If you remember, we had Streamlit on the podcast (that was episode 66) and we talked all about the Streamlit project, and everything… So we definitely recommend you go back and listen to that. But in general, I think Streamlit is an incredible project. I don’t know if you’re been following it at all, Chris…

Certainly after we had the conversation with the team I’ve found it incredibly inspirational. Streamlit is an open source framework to turn Python scripts into interactive apps, and I know prior to us engaging them I wasn’t really aware of that, but it’s a super-cool approach, and it’s showing the creativity… So yeah.

Daniel Whitenack

Yeah, so I know for me – well, I don’t know if I wanna say “absolutely zero”, but I don’t have much exposure and experience in terms of frontend engineering or building actual graphical interfaces, or web apps, or anything like that… At the same time, often when you’re trying to integrate a machine learning application into a business process, there’s a very human side of that, that becomes very difficult if you aren’t able to let people interact with what you’re building in a visual way.

I’m thinking right now - I attended a couple of workshops recently on active learning, and sort of human-in-the-loop methods… So you could have this scenario where maybe you’re working on the workshops we were talking about, like translation applications, machine translation applications… And sometimes when you deploy that, you might want to have a model in the loop that tries to identify bad translations, or something that your machine translation application is producing, and then have a user actually review and look at those and correct them… So you’ve got this kind of graphical piece, but also the user piece, the non-technical user piece potentially that’s interacting with that… So yeah, I see that scenario popping up all the time, and Streamlit I think fits right in there, which is why it seems to me that it’s getting a lot of attention. It’s an often seen pain point that isn’t really well dealt with.

I agree. And not only that, but I think that our community often undervalues what that user experience should be. We go straight to the model, we go straight to the training, and talk about the latest algorithms, and when you’re doing this stuff for real in production and you have a community that is expecting performance from you, not fulfilling the appropriate type of user interface and user experience really degrades from the work that you are doing in AI. If you don’t either yourself or people you work with have those skillsets, you can lose the value in something that would otherwise be great very quickly.

[12:00] As I’ve worked in a professional context, that point has been driven home to me over and over and over again… So I tend to approach AI from a user perspective, even if I’m the developer doing the work.

Daniel Whitenack

And Streamlit is talking about the ways they’ll use this money. They want to extend the application… And I should mention too, this is an open source application that you can use; you just Pip-install (if I remember right) back to Streamlit, and then run it locally… And they have a whole bunch of different customizations that you can add, like little sliders, and text input, and file upload, and plotting, and all sorts of ways you can configure it. So I think that when they’re talking about extending that, and the customizability of it, and customized layouts… They also talk about building in programmable state…

One of the things I was curious about - of course, because I’ve always used Streamlit just as an open source application - is if they’re raising money, they’re obviously a business, so I think the other thing that they’re going to devote that effort into is the Streamlit for teams, which in my understanding is some sort of combination of sharing and deploying in a secure way Streamlit apps that are actually sort of production applications, and not just demos or proof of concept sort of things, or little tools, and that sort of thing.

Yeah, I’m looking forward to seeing some of the other things they choose to do as they go into this new phase… So we may have to revisit with them at some point as they get some of this work done and are able to use that capital well.

Daniel Whitenack

So I know you had a topic that you wanted to go into, which I think is a good one… But before we do that, I just wanted to mention one other thing that actually just before recording today I saw as I was scrolling through Twitter, which is GPU-accelerated training now supported in Windows Subsystem for Linux. I have to admit, I have not been a Windows user for quite some time, but in my understanding, there are quite a few of them out there.

There are, yes. There are a few.

Daniel Whitenack

Yeah, quite a number… And I know for example – like, when I taught a couple courses at Purdue over the last few years, of course the lab machines there are Windows machines, or at least some of them… So it was always a struggle for me to figure out the best ways of doing AI experiments and programming in that environment… And mostly that’s just my unfamiliarity with that whole world. But yeah, this is pretty cool.

[15:47] So I guess Windows Subsystem for Linux, or WSL, enables the users of Windows to run a native, unmodified Linux kernel (or Linux command line) directly on Windows. So that’s pretty cool in and of itself. But now I guess the step is that they’re adding the GPU acceleration to that, and connecting up things nicely to CUDA, and those sorts of things.

Yeah, and I think that’s great. And like you, I have not recently been in Windows – once upon a time I was in Windows, moved away, but I’ve been hearing they’re really embracing open source in recent years, and that’s definitely brought me back around to being very… You know, I would consider them for a while, before they kind of hit that approach… So total kudos to Microsoft for making that very hard – it’s hard to steer a big organization in a very different direction… So I’ve been very impressed.

I think it’s a fantastic step forward to have the GPU support in that… And the funny thing is I keep running across Windows Subsystem for Linux being incredibly usable, from people that are using it. And I work in an organization that has a lot of Windows users… So I’m getting really good feedback on the work they’ve done, and being able to utilize that Linux kernel; it’s not a second-class citizen, as I understand. It really does a good job. So announcing that they have that support may change the landscape a little bit as that gets adopted over the next couple of years.

Daniel Whitenack

Yeah, I think that the ability to run unmodified Linux things in Windows - that part rings true right away for me… And it’s cool that you could do the GPU-accelerated stuff. I guess in terms of my own workflows, often I don’t have the GPU in my laptop, or sitting on my desk, but I’m using it on either a remote computer, or in the cloud… So in that case, if I was on Windows, I think the important thing would be this sort of command line stuff, and scripting things, and all that sort of things that I could do in the way that I’m used to…

But I know also that people build a lot of great systems for also gaming computers, for example, that are Windows-based. This is where my mind’s kind of going with this - there’s all of these gaming computers out there with GPUs and games for the most part… And I’m also not a gamer, so I’m really speaking outside of my domain. But for the most part, running on a Windows system. So it seems like now this would make it maybe easier to buy a sort of off-the-shelf gaming computer, or gaming laptop that’s Windows…

Yeah, I would think so.

Daniel Whitenack

…and then use the GPU on that for AI purposes. Whereas before maybe you have to buy that, and then install Linux, and figure out all the drivers etc. Maybe that makes that process easier, I’m not sure…

I would agree. I’m not much of a gamer, but I think that makes a lot of sense. I actually think I’m probably gonna try Windows Subsystem for Linux out in this context… Yesterday – I didn’t have a chance today, but yesterday I was logging into a dedicated DGX-2, and I got all 16 GPUs for myself…

Daniel Whitenack

Sweet…

…and that was a lot of fun, doing some work on there.

Daniel Whitenack

That sounds like a lot of fun.

It was a lot of fun. And so I might have to pull out a Windows laptop and do the same thing. I did it for my Mac going in… But yeah, I think I’m gonna give it a whirl.

Daniel Whitenack

You could write a blog post about a Windows laptop versus DGX-2. [laughs]

There you go.

Daniel Whitenack

I figure there will be a clear winner, but it would be interesting to do the comparison.

Well, I can start on the Windows side and use that as a client, then log into the DGX… We’ll use both systems. We can make that work.

Daniel Whitenack

Yeah, sure. Cool. Well, let us know; I’d be interested to hear from people if and when they start getting into this Windows mix of things. But moving on, I think you were mentioning a topic to me that I think is pretty interesting, and oftentimes very confusing for people. I know that we’ve touched on it before… Do you wanna mention what you were thinking there?

[20:06] Sure. I do quite a bit of mentoring for people - not only at my employer, but just in general - and people will reach out and ask for advice… And probably the thing that people ask about most often is they’re trying to figure out how to orient their own careers on AI/ML focus. I’ve been pretty open that I came from the software development world and reoriented my own career some years back on this… And it’s completely doable. I think it’s a myth that everybody in AI is a data scientist; I think it’s a myth that you have to have a Ph.D. or some other university-based experience to get into this field… It’s certainly not the case – none of those were the case for me, and a lot of people that I’ve worked with.

In a previous episode - I don’t recall which one, but I mentioned the fact that because I’ve been in my career now for 25 years(ish), I was around when the web was taking off. That was the early part of my career, was when the web went from the internet with no web, into the web that was initially just academic, and then took off. And I have observed, as we’ve gone through this AI revolution, that it follows many of the same trends of a brand new field that is exploding outward.

In the beginning, people thought computer science was the thing. You had to have a computer science degree to do that. But we rapidly – one role changed into many roles very rapidly, and there was a lot of diversity that got introduced… As well as the skills you needed, the level of experience to do different roles - it got complicated. And that’s good. It’s a sign of maturity. And we’re definitely seeing that in this field.

So a lot of people, when they’re trying to figure out “How do I do this? How do I fit into this new, exciting AI world? That’s where I really wanna be in the years to come, but that’s not where my education has been, and that’s not where my previous experience has been.”

One of those things that I start with with people, that I wanted to address today, is there’s not one role out there that you have to find your way into. There’s many ways, and actually it might be a role that you’re already playing in a slightly different context. It may be that you can kind of evolve your way into this. So if you’re already working with databases, and other data sources, data lakes - that’s one area that’s now very involved in the big data input that goes into these AI models.

So I really wanted to talk in a practical sense and have a conversation about what are different avenues people might be able to take to get into this fun field.

Daniel Whitenack

Yeah, and I think along with that – like you say, there’s a lot of jargon and job titles out there that people hear, and it might be confusing as to how they fit in… Like “data scientist” versus “machine learning engineer”, “research scientist” or “data engineer”, but maybe it would be good to talk about the various pieces of the AI workflow and where certain people might fit in in terms of a team of people working on these sorts of solutions.

That’s a great idea.

Daniel Whitenack

From my perspective, when you’re thinking about the workflow that often happens here, there is sort of an initial phase, which involves a lot of problem-defining and scoping in terms of what may or may not be possible, and what might be good to experiment with or try… And also an exploratory phase of data gathering, and pre-processing, and in an exploratory and interactive way, doing some model training and proof of concept evaluation and validation of a certain process.

[23:55] For example, if you’re a manufacturing company and you say “We’ve got this problem on our manufacturing line and we think maybe we could stick a camera in this location and detect this problem”, or something like that… You have to figure out “Okay, what would I want as my input and output data? What’s actually gonna be fed in? Could this camera be placed? What would be the appropriate output that would actually make it useful?” and then in an exploratory way, “Could I actually gather some of the data, which would allow me to train that sort of model? And if I could gather that data, what sort of model might I go after?” All of this stuff is very iterative and fuzzy. I guess this is the fuzzy phase. I don’t know if you’d agree with me. I think a lot of these projects start out that sort of way.

It does. There’s expertise required on the frontend. In real life you don’t jump into model development. I think there’s this perception of “Come join us. Hop on. Pick an environment, whatever you care about and build a model”, but there’s a whole lot of work that goes into it on the frontend. Before you even get to exploring in the data context, you’ve gotta figure out what is it that you think you want to build and why? And why on Earth would this particular approach be the right approach?

Daniel Whitenack

And why would AI bring value in, versus some other solution?

Totally, yeah. That’s a great point. There might be five different ways of approaching a solution to the problem, and if building a neural network is the most expensive approach to doing that - and when I say expensive, I mean the amount of effort and time and resources necessary to do it - why would you do that, if you can get a result that’s just as good from some other algorithmic approach? And whatever problem you’re gonna solve, you need expertise as far as being a domain expert on that problem area, and that might mean working with the business side of your company on what it is that they’re trying to provide for customers? Because at the end of the day, that’s what a company is there to do.

Daniel Whitenack

Yeah.

And we’re just barely touching on the frontend of this process. So there are so many ways to engage in this AI process that we’re talking about that don’t require that you have a Ph.D. in data science from a top university, and have 30 years of data under your belt.

Daniel Whitenack

Yeah, I think in this category of contribution (I guess we can call it), this problem-defining, scoping, exploratory stuff… In fact, I think there is a sort of solution architect sort of role here, where you do need some type of knowledge about AI systems and what is possible and what is feasible and what isn’t feasible, and what’s overkill and what’s not overkill, and appropriate usage, and scoping in terms of how long this is gonna take, or how much data we might need… But those are skills that you can pick up without knowing the difference between LSTM and GRU, right?

Yeah.

Daniel Whitenack

That level of detail is not required, I think, for this sort of thing… Although I may not be one of them, there are people out there that I think really enjoy going into a situation or a problem, maybe dealing with a client on a shorter timescale, like a few months, and scoping out a potential solution, and then passing that off to another team to actually do some more implementation in production-related things.

Absolutely. I’m one of those people sometimes. It’s one of the things that I do in my own job… And I’ll tell you, having built up some expertise in the field, if you can go talk to people in the frontend and help them figure out what it is they should be thinking about, what’s gonna serve the need, it can be quite fulfilling… And it does take some understanding and expertise of the field to be able to do that successfully. If you go in and you’re only a business analyst without any background at all, and no interest in developing the background, you won’t be as effective at being able to decide that. So strategy is a key part of the frontend of this process.

Daniel Whitenack

[28:19] Yup. And I think once the problem starts shaping up, this seems like it’s going to be a valuable thing to do. There’s still that exploratory phase of getting an initial proof of concept dataset together, proving out that this will actually work and produce the type of value that we want… And oftentimes in this stage of things getting a kind of brute force solution is kind of how I think about it, in terms of “This thing might not be optimized in every way. It might not have the exact accuracy or performance that we want”, but all of the right things are plumbed together, and the right type of data is coming in, the right type of pre-processing is happening, the right type of model is producing some result, which is then being used to create something of value.

That rough plumbing of those things together requires now some technical skill, but this doesn’t have to be a fine-tuned C++ application that runs with super-high-performance on an embedded device out in the field. This is like proving out that the thing works, and developing the right type of solution. So I think it’s a more technical level, but it’s not as hardcore software engineering or data engineering as it could be.

When you say that, I agree with everything you’ve just said. The way I would express that is that AI development fits very well into an agile software development process, where you’re having to iterate and you learn from that iteration, and you make those adjustments, and then you go back. And that happens both at the model level, and it also happens in terms of how you’re going to choose to deploy and do the engineering you need to accomplish that.

I very much – and I’m gonna say something slightly controversial, I think… And that is that I think of AI development as a component of software development… Which a lot of data scientists will say “No, it’s not. No, it’s not.” But when I’m looking at it in production, and I’m looking at us actually managing that, I see it in that larger context, because all of those other activities are happening around it… So - definitely.

Daniel Whitenack

I really liked where you were headed with what you’re saying, Chris, in terms of AI development being viewed as a sort of subcategory of software development. I think this fits very well into the mindset of another person we had on the show, Joel Grus - we’ll link to his episode - from the Allen Institute for AI. I think he’s mainly working on the AllenNLP project. He had a lot more things to say about that, and why it’s useful.

[32:09] I definitely think that – we kind of started talking about the more technical exploratory stuff, where you’re trying to figure out what you’re gonna do and start plumbing the right pieces together and validate a solution. You will see some difference in industry, at least from my perspective, in terms of sometimes at an organization the people that are doing that are not the same people that are at the end of the day involved in producing the production system that’s actually implemented. And then you’ll see other organizations where at least there is some overlap between the team that does this sort of exploratory work and the team that actually produces production systems.

From my perspective, the latter has a big advantage, because if you have total separation between those groups, then when something goes wrong in production, basically, the production team will maybe in a non-confrontational way at the end of the day say “Well, this is a problem with the solution, and the model, and the way it was developed. Not a problem with our implementation.” And then the people that did the exploratory work and validated the solution will say “No, our solution is great. There must be something in the implementation.” No one’s taking ownership of it and no one’s taking ownership of the robustness of it in particular, like in how robust the solution is. So I think in a perfect world there is some overlap between the group that does those things.

I agree with you completely. And I think the reason that that second group has the advantage is because they are able to learn from those earlier processes. So if you have one group doing the prototype, they’ve gone through that process and they’ve learned what they need to know… And if they’re gonna hand it off to a production-only group, well they’re starting from zero again, or from whatever documentation came out of that first thing. So there’s certainly an advantage to the learning process, which is why AI/ML development is best-served in a larger, agile development process.

And if you’re in that software development world and you’re hearing this, these should be familiar terms to you… And those are all potential in-roads for you and your career and your particular interest in this, to translate existing skills and existing interests into this AI world, and be able to do that.

There’s no point where you’re ever done. You can continue to migrate across that space by always learning, and always deciding where you wanna go next, and doing that. I think that’s crucial for career development in general, but especially in this world.

Daniel Whitenack

Yeah. And even in the phase of this that’s exploratory, I often use this analogy which listeners will be familiar with - a lot of AI development is more akin to cooking according to a recipe, than it is some intense research and development… And so even in that exploratory phase, it’s taking pieces of things that have been done before and putting them together in a unique solution… Which is very similar to software engineering, and if you were to produce a proof of concept in software engineering.

The difference, I think – there is a sort of toolset difference maybe that some software engineers might be a little bit uncomfortable with… Like, in this exploratory phase you might have a Jupyter Notebook that shows “Here’s how I ingress data, and then here’s how I pre-process the data, and then here’s how I train my model, and then here’s how I do inference”, and then when you move into the production side of things, maybe it gets a little bit more comfortable in terms of the tooling for software engineers, where you would take that notebook and then say “Well, I’m not gonna run my notebook in production; I’ve gotta take out this data gathering piece and make it a Docker container that’s gonna run in Kubernetes, on AWS. And then I’ve gotta take out this preprocessing container and figure out how to run it in parallel over a large dataset in the cloud. And then I’ve gotta take my training piece and pull that out, and dockerize that, and figure out how to run it on some GPU-accelerated infrastructure.” Those pieces still carry through, but the toolset and the way you go about it definitely changes.

[36:30] Yeah, that’s a great point there… And that is that at different points you may have different people involved in the maturation, the maturity aspect of this process. It’s really common for software developers to look at a Jupyter Notebook for the first time and scoff at it, and say “No, I grew up in software development best practices. I’m looking at this Jupyter Notebook and it’s – why would you do that?” But if you were the data scientist that’s trying to put the model together, it’s a fantastic way of iterating rapidly. And your job at that point is not to produce production software, it’s to test and try different things out.

You may be implementing a transfer learning approach where you’re then trying to customize that transfer learning into the specific solution you need… And likewise, the data scientist needs to recognize - “When you deploy it, you’re not deploying that notebook. You were using the notebook for what it’s good for, but it has to be a software component.” It’s a model that’s wrapped in a software component, that’s being deployed out into a larger software system at the end. So there’s a role for all of these things… So leave your biases at the door. Leave them there. Look for why each tool or each role is so important, and recognize that. Because I’ve seen people fall down in that way many times.

Daniel Whitenack

Yup. I know for example we had a question in our Slack recently and a discussion about “Hey, I hear all of this stuff about training, and I’m able to run these examples… But then when I try to do this inference in production, the performance is so terrible. Why is no one talking about this, or why is it hard to find resources about this?”

It’s a great question.

Daniel Whitenack

There definitely are resources out there, and I think the commenter said it would be great to have even a full episode about that side of things, and model optimization. That is another piece of the puzzle that changes when you move later on into a project. If I’m running this on an edge device, in a manufacturing plant, it’s gonna have concerns. If I’m doing it on a mobile device, it’ll have different challenges. If I’m doing it on a beefy cloud instance, then you have maybe more flexibility… But you may have latency issues you wanna deal with, or something, in responding to people.

That’s a great question from the listener, and I love how you led into that… And really - I’m not sure if it’s an official term or not at this point, but we have conversations in my own collection of colleagues about this all the time. We refer to it as AI engineering. I think the thing that is so crucial about that is to recognize that two years ago we were talking about the edge as kind of an exception case, because people were really deploying most often into servers, or locally, or whatever… And it was kind of a standard, well-known environment. But going forward, most things will be at the edge. As you make models and the utility of models pervasive in our society, in our culture, you’re gonna see edge devices being the targets of that deployment in so many different ways… So that requires that you rethink your engineering to accommodate that.

[39:50] Once upon a time, deploying software was really – it was kind of code-centric, and you’d think about just processors, and so on… But now it’s all about data. If you are deploying to some sort of mobile platform - maybe it’s an autonomous vehicle - you have telemetry from that vehicle, you have sensors in that vehicle, you have cameras in that vehicle… And to provide the level of performance you need to be able to do real-time inference on that requires special knowledge of engineering on getting the right data, in the right way, to the right place, at the right time, so that it can be acted upon. And you no longer are doing static data that you’re running through a server, or something…

So AI engineering is crucial for making this stuff actually work. It’s later in the process than what we were talking about, but after that data scientist who’s been working in the Jupyter Notebook you’ve gotta either put it out there in the world, or it’s useless. It doesn’t do anything for you.

Daniel Whitenack

Yup. Another piece of this puzzle is actually – so there’s the AI workflow and the different phases along a project, all the way from solution architecting or consulting, to the very technical side of AI engineering things… But then there’s also – you know, you could look at that workflow in different domains, or verticals, and that’s gonna look very different. Of course, in maybe the manufacturing world you’re gonna be thinking a lot about computer vision, and running things on edge devices, and potentially hazardous conditions, where you have a lot of device issues…

In other cases, like in web space, if you have a web app that you’re dealing with, or a software as a service company, then you might be running your models a lot of time in the cloud, and maybe you’re dealing with a lot of natural language processing issues, and dialogue-related issues with customer service, and all of that… And each of those sets of problems has its own tooling and its own methods and its own community, and its own way of going about things… So I think another thing to think about when you’re thinking about the lay of the land is also the domain.

Like you said, this happens in software engineering too, and people have specialized in certain areas of software engineering, and AI I think will be no different. There’s a lot of specialization that can happen.

Yeah, I think in my own experience it definitely bears that out. If I look at counting my current employment, my last three organizations that I’ve been a part of, and in all three had an AI role, in the first one we were working with clients, and it was server-based. It was kind of what I think of as a little bit old-school now. It’s funny that it doesn’t take very long for something to become old-school, because it evolves so fast… But yes, we were deploying models into big servers, that were resource-rich. And then in the next organization I went to, we were focused on warehouse spaces, and introducing robotics, and cameras, and different things that make logistics work. And that presented a different set of challenges that were specific to the domain.

Now I’ve moved into the defense industry, and I focus on autonomous platforms, and other adjacent technologies… And some of the previous things certainly had an effect, but this is a new domain that has its own specific constraints and challenges, and that’s the case. So we are definitely seeing diversity in how AI is conceived and implemented, depending on the context that you’re using it in.

Daniel Whitenack

Yup. Well, one thing that’s true across all of these workflows and domains is that definitely you’re going to have to deal with bias in your data and model fairness… And this kind of brings us to the end of our conversation, where we’re gonna share some learning resources with you… In light of our current climate and things going on in our world, it’s only natural to share some resources about bias in your data and model fairness.

[44:07] I think that one of those resources which maybe is a good jumping off point - there’s a nice write-up in Google’s Machine Learning Crash Course about fairness and types of bias, and I thought this was pretty interesting. Maybe certain branches of science have similar terminology around this sort of thing… Survey science, for example, I think is about bias a lot, and populations, and those sorts of things…

So this was really helpful for me to kind of pick up some of this terminology in examples. They actually go through talking about reporting bias, automation bias, selection bias, group attribution bias, and others, and give examples of those types of biases and how they can creep into your data… Which I thought was incredibly useful. I don’t know how familiar you are with some of these things, Chris, but it was really helpful for me because I was not familiar with the sort of categories that you could think about bias in.

Yeah, totally. And in the involvement I have in the AI ethics space - bias is a huge part of it. It’s probably the concern that most people associate most with AI ethics. It’s the thing that people think about the first… So understanding those different types of bias and how they impact an outcome, and how they can result in unexpected outcomes, which can be incredibly common, is pretty important. So it’s a first good way to get into that.

And kind of going back - I think it’s particularly applicable as we have this episode at this particular time, given the large public response to injustice, to think about some of these tools I’ve already heard are being used in unexpected ways against protesters, for instance; even ones that are not breaking the law in any way. And as we think about different types of bias here, think about how do you want the application of these tools to be used. Facial recognition can occur long before or after a protest event, by following people through cameras, and having to do an automatic tracking… There’s a lot of impact on how we may wanna think about this.

Daniel Whitenack

I’d also encourage people - just a couple more quick mentions here - to take a look at IBM’s Fairness 360 website. It just includes a really great breakdown about various ways that people are dealing with fairness, both pre-processing of data, in-processing or model change, actual changes to your model that you can make, also post-processing and monitoring of your predictions… They talk about a whole variety of things with great examples, so check that out.

Also, Google’s Responsible AI Practices - they have a great write-up and discussion of fairness and bias… There’s also a good project from DrivenData called Deon, which includes a nice checklist (if you like checklists) that you can sort of start with a default checklist and update it to make sure that you’re checking for certain things like bias and fairness in a project… And that can be embedded within your repository, or within a Jupyter Notebook, or other things. We’ll link to all of those in our show notes. I think it’s well worth people’s time to take a look at those things and make sure and educate themselves about how that can creep into your process.

Totally. There’s one other one that I’ll throw out, that has been useful beyond the industry that it started in. Because of the process that the U.S. Department of Defense entered into on their AI Ethical Principles - and we had a show where we addressed that in depth previously - they went out into industry and Academia and solicited feedback from many different people in the space; many of them were luminaries whose names you would recognize. And you can actually go and do – if you google “DoD AI Principles” you’ll find that they have their five, just like Google, and Microsoft, and all the other players do… But I’ve noticed recently that they’re being adopted in completely different use cases, because they’re not specific necessarily to the industry that they were formed in. So that’s a really good one that I end up interacting with quite a lot.

Daniel Whitenack

Awesome. Well, it’s been great to have a conversation with you again, Chris; great to have you back, and I’m looking forward to our future conversations and how those will be shaped with our ever-changing world in the future… But I appreciate our listeners hanging through with us this spring with changes in our schedules, and also changes in your life, and being in different places than you normally would be… I’m glad that you’ve continued to stick with us, and I’m looking forward to more conversations.

Absolutely. And from my part, I just wanted to thank the listeners for bearing with us as we started the show, and having my sharing what had happened to me. In the show notes I’m also gonna include a link to my experience of Covid, in a way… So if it’s something you’re interested in and want to know somebody that’s actually dealt with it in a first-hand way, you can check that out in addition to the normal notes for the show.

Thank you so much for listening.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

View all episodes

Player art