Hard drive reliability at scale with Andy Klein (Changelog Interviews #537)

All Episodes

This week Adam talks with Andy Klein from Backblaze about hard drive reliability at scale.

Changelog++ members get a bonus 15 minutes at the end of this episode and zero ads. Join!

101 minutes
Recorded Apr 19, 2023
Published Apr 26, 2023
Download (97MB)
Transcript
🎧 29,059

Hardware

Featuring

Andy Klein – Twitter, LinkedIn, Website
Adam Stacoviak – Mastodon, Twitter, GitHub, LinkedIn, Website

Sponsors

DevCycle – Build better software with DevCycle. Feature flags, without the tech debt. DevCycle is a Feature Flag Management platform designed to help you build maintainable code at scale.

Sentry – See the untested code causing errors - or whether it’s partially or fully covered - directly in your stack trace, so you can avoid similar errors from happening in the future. Use the code CHANGELOG and get the team plan free for three months.

Postman – Build APIs together — More than 20 million developers use Postman for building and using APIs. Postman simplifies each step of the API lifecycle and streamlines collaboration so you can create better APIs—faster.

Typesense – Lightning fast, globally distributed Search-as-a-Service that runs in memory. You literally can’t get any faster!

Notes & Links

📝 Edit Notes

We love 45Drives. If you need an intro to them, let us know.

Chapters

Chapter Number	Chapter Start Time	Chapter Title
1	00:00	This week on The Changelog
2	01:45	Sponsor: DevCycle
3	04:32	Start the show!
4	07:01	10 stories from 10 years
5	14:39	Software at the storage layer
6	17:18	S.M.A.R.T. predictive analysis
7	20:49	Digging into S.M.A.R.T. stats
8	24:08	Causes of High Fly Writes
9	25:00	Collecting and storing S.M.A.R.T. data
10	29:10	Sponsor: Sentry
11	33:19	Many DCs around the world
12	37:24	Burn-in testing
13	40:45	Dollars per failure and leveraged buying
14	43:35	The Backblaze Storage Pod
15	51:13	Storage servers are HEAVY!
16	57:11	52Us, 12 high @ 150lbs each
17	59:34	Sponsor: Postman
18	1:03:12	How does backblaze buy drives?
19	1:09:12	Determining the size of the drive
20	1:12:08	What is your capacity/usage limit?
21	1:14:39	How should homelabers buy drives?
22	1:25:43	SSDs as boot drives
23	1:30:41	HDD vs SSD cost
24	1:33:26	How Adam buys drives
25	1:36:20	Closing out the show
26	1:38:26	Outro

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

So I’m here with Andy Klein from Backblaze. Andy, I’ve been following your work and your posts over the years; the Backblaze Drive Stats blog posts have been crucial for me, because I’m a homelabber, as well as a podcaster, and a developer, and all these things, so I pay attention to which drives I should buy… And in many ways - I may I buy the drives that you’re suggesting, but it’s an indicator of which brands fail more often. But I’ve been following your work for a while. In the pre-call you mentioned your title, at least currently, is principal cloud storyteller, but what do you do at Backblaze? Give me a background.

Well, I started out as the first marketing hire, a long time ago, 11 years or so ago, and it’s kind of changed over the years, as we’ve added people, and everything. These days, I spend most of my time worrying about drive stats, the drive stats themselves, the data that we’ve collected now for 10 years. So we have 10 years’ worth of drive data that we look at, and I spent a lot of time analyzing that and taking looking at it. And then also spending some time with infrastructure, things like how does our network work, or how do our systems work, or how do our storage pods work? So a lot of writing these days, a lot of analysis of the data that we’ve collected over the years. So that’s what I do.

I think storyteller might be fitting then, because that’s kind of what you do. If you write a lot, and you dig into the data, the analysis… I mean, that’s the fun part. That’s why I’ve been following your work, and it’s kind of uncommon for us to have a “marketer” on this show. You’re more technical than you are marketing, but you are in the marketing realm, the storytelling realm of Backblaze.

Yeah. I mean, a million years ago I was an engineer. I wrote code for a living. Then I got into IT, and the IT side of the world, got my MBA degree, because I thought that would be useful, and then crossed over to the dark side. But I’ve always appreciated the technical side of things, and that if you’re a developer, you know what it is, right? You’ve got to dig in, you’ve got to find out what’s going on. You just don’t take something at face value and go “Oh, it’s great. Let’s go!” And so that’s been, I think, what drives me, is that curiosity, that analytical point of view. So it’s helped me a lot, and especially doing what I’m doing now.

This recent blog post, I feel like it’s almost preparatory for this podcast, because you just wrote a post called “10 stories from 10 years of drive stats data.” And this kind of does let us drive a bit, but there’s a lot of insights in there. What’s some of your favorite insights from this post? What were you most surprised by, I suppose?

I think the thing I’m most surprised with is that we’re still doing it. [laughs] It’s great to collect the data, it’s great to tell stories from things… But after 10 years of it, it’s amazing that people find this interesting, after 10 years. So that’s the coolest part of it all. And we’ll keep doing it for as long as people find it interesting. I think that’s the big deal about it.

But there wasn’t anything, any particular insight that just drove me, that made me say, “Oh, man, I didn’t realize that.” It’s the whole dataset together. And every time I dig into it, I find something that’s kind of interesting, and new. We’re getting ready to do the new drive stats posts for Q1, and so I’m spending a lot of time right now going through that data, and you suddenly see something you hadn’t seen before.

[08:11] Or what I really like is others start to ask questions about it. People start asking questions, saying “Hey, what about this?” Or “I did this? What do you think?” And so we’re taking a particular article that was written a few weeks ago, on the average life of a hard drive, and we’re just applying that, what they did to our data and seeing if we come up with the same answer, how good is that answer, and so on. So there’s always a fun insight or two, and I kind of learn something every time I go through this. So the 10 years, I could have probably put another 10, or 20, or 30, or 40 on there… But I think after about 10, they get boring.

For sure. 10 insights in 10 years does make sense. It makes a good title for a blog post; that’s sticky enough. I guess, help me understand, since you say you’re surprised by the 10 years of this data collection, how does Backblaze use it internally to make it worth it from a business endeavor? Obviously, it has some stickiness to attract folks to the Backblaze brand, and what you all do… If not - you know, I may not use your services, but I may learn from your storytelling; you’re in the trenches with all these different hard drives over the years… How does this data get used internally? How is that accomplished for you?

So that’s a really good question. I mean, almost from the beginning we were tracking the smart stats. And there were a handful of them, I think five or six that we really looked at… And we’re doing that since - whatever, 2008, 2009, when we first started the company. We weren’t really saving the data, we were just looking at it and saying “are there things interesting here?” and moving forward. And that helped. That helped. The methodology we looked for or we worked with was if something throws an error like an [unintelligible 00:09:56.02] or an ATA error, or some other monitoring system throws an error, then you can use the smart stats that you’re looking at to decide if this really is a problem or not. ATA errors are a great example. They can be a disk problem, but they can also be a backplane problem, or a SATA card problem, or any number of different other things that could be part of the real issue.

So if it identifies something - great; let’s take a minute, let’s see what it’s saying about that drive. Let’s take a look at the smart stats and see if there’s any real data there that helps back this up. Are there media errors. are we getting command timeouts, and so on? So that’s the way we’ve used it over the years. And when we started saving it, what we could do with that was get patterns on a macro level. So not just on a single drive, but on a model level.

And so you start looking at things at a model level and you go, “Hmm, that particular model of drive doesn’t seem to be doing well for us.” And then it allowed us to begin to understand the concept of testing. So we didn’t have to wait until drives started failing; we could start bringing in a small subset of them, run them for a period of time, observe their behavior in our environment, and then if that passed, then we would buy more of them, for example. And if it didn’t pass, then we would remove those, as the case may be, and move on to a different model. But we always wanted to have a wide berth, a wide number of different models given the size, and so on… Because if you get too narrow, you get too dependent on a given model, and if you suddenly have a problem with that model, you’re stuck.

Yeah.

So the data that we collect helps us make decisions along those lines. And now what people are doing - we’ve talked to companies that are doing it - they’re starting to use that data in more of a machine learning, or AI, if you want to go that far, type of way, to analyze it and predict failure moving forward.

[11:55] And I’ve seen some studies, and we’ve even talked about that in a blog post or two, about the AI, the use of AI… Or machine learning; that’s the more proper one here, it’s really not AI. And you see how you can make predictions on things like “Hey, based on the stats, the smart stats stacked up this way, this drive will die; it has a 95% chance of dying in the next 30 days.” That’s a really good piece of data to have in your hand, because then you can prepare for it. You can clone that drive, get a new drive, put the new drive back in, remove the one that’s going to fail, and you’re done. And you don’t have issues with durability. And I’ll get to that in a second.

But that kind of capability is really kind of cool. It also does the other way, where you can say, “A drive with this kind of characteristics has a 70% chance of lasting the next two years.” Okay, cool. That means that from a planning point of view, that model - I now understand its failure profile, and I can move forward, as I buy things and consider replacements and do migrations, and move from two to four to eight to twelve terabyte drives, and so on.

I mentioned durability earlier… Durability is that notion of “Is my data still there? Did you lose it right?” And all of the factors that go into durability, that people write down - how many nines there are, right? …of durability. But the things that you want to have, that are important, is to have everything in your system spinning all of the time. Well, that’s not a reality. So anytime something stops spinning, it becomes non-functional, you have a potential decrease in your durability. So what you want to do is get that data, that drive back up to speed and running as quickly as possible.

So if I take out a drive and I have to rebuild it, so I take out a drive that’s failed, and I put it in a new drive, and it has to rebuild in the array it’s in, effectively, that might take days or even weeks. But if I can clone that drive and get it back in and get back to service in let’s say 24 hours, I just saved myself all of that time and that impact on durability.

So the data that we’ve been collecting all of this time gives us that ability to see those types of patterns, understand how our data center’s behaving, understand how particular models are behaving, and make good decisions from a business point of view about what to buy, maybe what not to buy, and so on.

Yeah. It’s a very complex business to manage those, I’m sure. Can you tell me more about the filesystem, or stuff at the storage layer that you’re doing? Because you mentioned cloning? I’m wondering, like, if you clone rather than replace and resilver, which is a term that ZFS uses; I’m not sure if it’s a term that crosses the chasm to other file systems, or storage, things like [unintelligible 00:14:57.00] or others, but… You know, to clone a drive - does that mean that that array gets removed from activity? It’s still active, of course, but you clone it so that there’s no new data, or data written, so that that clone is true? You know, it’s parity plus data on there, and a bunch of other stuff. Can you speak to the technical bits of like the storage layer, the file system etc.?

Yeah, so we didn’t build our own file system. I don’t remember right off the top my head with which one we actually use, but it’s a fairly standard one. What we did do is we built our own Reed–Solomon encoding algorithms to basically do the array. And we can do it in 17+3, 16+4, whatever the model is of data to parity. And it depends on the drive size.

So when you take a drive out that’s failed, if you have to replace it, that thing has to talk to the other drives in what we call a tome; a tome is 20 drives, that basically create that 16+4 or 17+3 setup. And that drive has to talk to all the rest of them to get its bits back, so to be rebuilt. And that process takes a little bit of time. That’s what takes the days or weeks, right?

[16:08] If I clone it, if I remove that drive, the system keeps functioning. That’s part of the parity mechanism, right? So no worries there. And then when I put the clone back in, the clone goes, “Wait a minute, I’m not quite up to speed”, the drive does, “but I’ve got a whole bunch of stuff, so let me see what I’ve got”, and that’s part of our software, that says, “Let me know where I am. Oh, I have all of these files, I have all of these pieces.” It does a bunch of things called shard integrity checks, and so on, and so forth, to make sure it’s all in one piece. And then it says, “Okay, I still need everything from yesterday at 3:52pm, blah”, and then it starts to collect all of the pieces from its neighbors, and rebuild those pieces it needs on its system.

In the meantime, the system is still functioning. People are adding new data, or reading data from it, and they’re picking it up from the other 19, in this case, and that one drive kind of stays in what we call read-only mode until it’s rebuilt. And then once it’s rebuilt, it’s part of the system. So you cut down that process of replacing that one drive from, like I said, weeks, perhaps into a day or two.

Right. And the software that you mentioned, that does the smart reading etc. to kind of give you that predictive analysis of “This drive may fail with the next 90 days”, which gives you that indicator to say “Okay, let me pull that drive, do the clone”, versus just simply replace it and let it resilver, or whatever terminology that you all use to sort of recreate that disk from all the other drives in its tome, or its array. You wrote that software. Is that available as open source, or is that behind the scenes, proprietary?

Right now it’s ours, if I was to say it in a very inelegant –

For sure.

These developers are gonna hear this and go – my guys are going to come yelling at me. [laughs] But it hasn’t been open sourced, and a lot of that has to do, like I said, with the fact that the edges aren’t very clean, so it just kind of works in our system, and goes from there. What it does today is it’s more of a confirmation using the smart stats system. So in other words, it’s looking for - I mentioned earlier, ATA errors, and so on, as the first identifier. And once it does that, then the smart stats get applied to see if it’s truly a failure, or if it’s some other problem that we need to go investigate.

Just to clarify too for the listeners, if you’re catching up to this, self-monitoring analysis and reporting technology; that is what smart is when Andy refers to smart. It’s a feature in the drive, but it’s software that lives, I suppose, on the box itself, right? So it’s between the operating system and the hard drive having the smart capability.

The smart system is built into each drive. What happens is we run a program called smartctl, that interrogates that data, and it’s just captured into each drive. Some manufacturers also keep another layer of data that they also have. So the drives are kind of self-monitoring themselves, and reporting data. And then we can ask it, “Hey, please give us that data”, and that’s what we do. Once a day, we say – actually, we run the SMART checks on a routine basis. It’s usually multiple times a day, but once a day, we record the data. That’s what makes up our drive stats. And so it’s each drive just holding it, and saying “This is this is my current status right now of SMART X and SMART Y.”

Some of the values update on a regular basis, like hours - there’s a power on hours. So I assume once an hour that thing gets updated. There’s temperature, which I think is probably something that is continually updated. I don’t know if it’s once a minute, once every five minutes, or whatever, but it has a temperature of the drive. So there are a lot of other things in there besides, you know, how good the media is, how much trouble I had writing, or reading from a particular sector, or sectors, as the case may be.

[20:14] Command timeout is a really good one, because that indicates that the thing is really busy trying to figure out what it was supposed to be doing, and it’s starting to stack up commands… And then there are some others that have interesting indicators, like high-fly writes, which is how far the head flies from the disk… And that number is – the tolerance on that is so thin these days. I mean, we’re talking nine platters in a drive; that head is really, really close… And so if it comes up even just a little bit, it’s getting in everybody’s way. So that’s another thing that gets monitored… And so on.

I was looking at a drive while you were talking through that bit there. I have an 18-terabyte, one of many in this array, and I was looking – and so that you’d be happy to know that my command timeout is zero. I don’t know what a bad number would be other than zero… So at what point does the command timeout of that, of a disk get into the bad zone?

It’s a good question. It does vary, and it usually comes – that particular one happens to come with usually some other error. One of the things we’ve found when we analyzed smart stats individually is we couldn’t find any single smart stat which was indicative, by itself, of failure, until it got to be really, really weird. Like, I’m finding bad sectors. And so having a few bad sectors on a drive is just a thing that happens over time; and they get remapped, and everybody’s happy. But having a thousand is a lot. But maybe that’s not a lot on an 18-terabyte drive, because the sector size is the same, basically. But it is a lot on a 500 meg drive, or a 500 gig drive.

So they’re somewhat relative kind of things, but no individual one generally is indicative of failure. It usually is a combination of it. And then some drives just fail; they don’t give you any indication at all, and then they just go, “I’m done. I’m out of here.” We’ve seen that – in roughly 25% of the drives we see, at least the last time I looked at all of this, just failed, with no indication in SMART stats at all. They just rolled over and died. And there doesn’t seem to be any thing/relation to a brand, or a model, or a size, or how long they were in… It just seems to be they get there.

Now, inevitably what happened is before they failed, something went wrong, and maybe the SMART stats got recorded. But we don’t record them every minute, because it would just get in the way. So maybe we missed it. I’m open to that as a possible explanation. But most of them, you do get a combination of five or six different SMART sets that we really pay attention to. A combination of those – you’ll get those showing up in about 75% of the time.

And like I said, there are some… Command timeouts is a good one, where “Hey, I’m having a little trouble. I’m having a little trouble. Oh, okay, I caught up” and it goes back down to zero. And then there are others, like bad sector counts; they just continue to go up, because – they don’t get better.

Yeah, they only get worse.

Once they get mapped out, they’re gone. And you have to understand that about each of the stats, as to whether or not it’s a static, an always-up number, or it can get better. Things like high-fly writes - we see that happen. Almost every drive has that at some point or another. But what happens is if you see – the favorite way to look at this is you look at it over a year, and there’s 52 of them. 52 is a high number, but if it’s once a week… Meh. If they all happened in an hour, I have a problem.

[24:04] Yeah.

So, so there’s a lot of that going on with the SMART stats as well.

What causes a high-fly write? Is that like movement in the data center, physical hardware movement that makes it, or is it –

It could be. It could just be the tolerances are so fine now that the actuator moving through there, and you get a little vibration, to your point… Or maybe there’s just something mechanical in that actuator, where it occasionally has a little bit of wobble in it, for no particular reason. But it usually has to do with some movement. It’s never a good idea to have a spinning object, 7,200 or 15,000, or whatever RPMs, and the little thing stuffed in there, you know, less than a human hair, and start jiggling it around. [laughter]

Yeah, for sure. Bad things happen.

Bad things happen.

Let’s talk about the way you collect and store this SMART data. Let me take a crack at this; this may not be it at all. If I were engineering this system, I would consider maybe a time-series database, collect that data over time, graph it etc. How do you all do that kind of mapping, or that kind of data collection and storing?

So once a day, we record the data for posterity’s sake. Like I mentioned earlier, we do record it –

Like you take a whole snapshot of what gets spit out from smartctl, you grab all that?

We grab a particular – I think they go… We call them pods, okay? The original storage pod, 45 or 60 drives, and then we go pod by pod. That’s the way we think about it. So we go to that pod and we run smartctl on the drives in there, we pull out the data, we store that data, and then we add a few things to it. We keep the date, we keep the serial number, the model number, and the size of the drive, and some other basic information that we record from the device itself. So we know what storage pod it’s in, we know which location it’s in, and so on and so forth.

At that point, we have the data, and then we store it into - I’ll say a database [unintelligible 00:26:15.19] actually stores locally, and then it gets transferred up overnight. That’s part of – the boot drives get to do that fun stuff for us. So we take the snapshot all of that data, we store it locally, then it gets brought up overnight… Then there’s a processing system which goes through and determines the notion of failure. So if a driver ported something to us, it didn’t fail yet. The next day, we go in and we go back to that same pod, for example, and we notice that one of the drives is missing, right? We look for 60, we only got 59. What happened? So that gets reported. And then that becomes part of what the software on our side processes over the next few days. “Tell me about that missing drive. What happened to it?” And at that point, we interact with some of our other systems or maintenance in our inventory systems to see what actions might have been taken on that drive. We also have some smarts built into the software itself, to help identify those things. And if all of those things make sense, then we go, “It failed”, or didn’t. “It didn’t because it was a temp drive, that was in for a few days, and then it got taken out, and replaced by the clone…” So it didn’t really fail, it just didn’t get a chance to finish. So we shouldn’t fail it, right?

Or we migrated that system. We took all of the drives out, and we went looking for them, and they weren’t there. But we don’t want to fail 60 drives, and so that’s not what happened. So the system figures all of that kind of stuff out. It looks, like I said, it can connect up to the inventory of maintenance systems to help validate that, because we have to track all of those things, for obvious reasons, by serial number.

[28:00] So it’s fairly complex software that goes in and does that analysis, and it takes sometimes a few days for it to kind of figure out whether a missing drive is really a failed drive, or whether a missing drive is a good drive, and it just got removed for a valid reason. And then once that happens, then we record it. Once a quarter, I go in and I pull the data out and look at it, and I’m looking at it for the most recent quarter. I actually go back in and invalidate all of the failures as well by hand, against the maintenance records in particular, just because we want that data to be as accurate as possible. And then we confirm all of that, and almost always we get a really solid confirmation. If we find anything funny, we keep looking. And that’s the data we publish, and that’s the data we base the reports on.

In terms of your data centers, do you have many of them throughout the world? I assume the answer is yeah.

Yeah, we have five right now. Four in the US and one in Amsterdam. And they all run the same software, and the process is the same. And the automation all occurs in the front end. That’s all fun, and stuff like that. The validation, if you will, is me, and a little bit of that comes from me putting my name on this thing. So I want to make sure the data is right. So I don’t want to automate myself yet… [laughs]

Not yet. We’ll have Andy Klein AI at some point.

Yeah, exactly. Well, I’m not quite ready to turn drive stats over to ChatGPT yet… And I think – I don’t know how long I can continue. I mean, we’re up to almost a quarter of a million drives right now. Luckily, we get –

In service, currently?

In service now, yeah.

That’s a lot of drives.

And so in any given quarter we’ve got – the last quarter we had 900 and something drives that failed. That sounds like a lot, except we have 250,000… So no. But it is an intensive bit of work for me to do the validation, but I do think it’s worth it… And yes, we are looking at systems which will help improve that bit of validation as well. But like I said, this just comes historically from eight years of me putting my name on this and wanting to make sure that the stuff that we publish is as good as it can be.

Doing some quick math here, it sounds like maybe 99.8% of your drives remain in service. 0.2% is what fail in a quarter, roughly.

It could be that’s a fair number. We actually do an interesting calculation, because that basic assumption there assumes that all of those drives have been in service for the same length of time, and that’s not the case, of course.

Right.

And so we actually count what we call drive days. So a drive is in service for one day, that’s one drive day. So if you have drive model ABC, and there are 10 drives, and those 10 drives have been in service for 10 days, that’s 100 drive days for that model. It’s the most simplest way to do it. And so we count that, and then we count failures, of course, for that model, or all of the drives, or whatever the case may be. Model is the most logical way to do all of this stuff. Any other combination and - I’ll be honest - you’re cheating a little.

[36:10] We do it quarterly for all of our drives, and then we do it quarterly for a lifetime, for all of our drives, in each quarter. But we also publish them by model number. And the model number is the more appropriate way to look at it. Not just the macro number. The macro number we’re going to come up with, for example, might be like 1.4%, 1.5%… And that’s a solid number, and it’s a good number, but it’s for all of the different models. And they vary in their failure rates over a given period of time.

So drive days is the way we do it… When we first started down this road back in 2013, we spent some time with the folks at UC Santa Cruz, who are really smart about things like this, and they gave us a very simple formula, which was based on drive days to do that computation of the failure rates. And then we explain it; almost every quarter, we have to explain it, because most people do exactly what you did - how many drives you’ve got? And you do the division. And it’s the most logical thing to do, but it doesn’t account for the fact that, like I said, at any given moment we have new drives coming in, we’re taking out old drives, and so on. So all of that changes, and the drives days does.

Do you do much preparation for a drive to go into service? Like, do you do a burn-in test? Do you worry about bad sectors before you put them in? Or you just roll the dice because you’ve got so much history that you kind of have confidence. How do you operate when you add new drives to the system?

That’s a really good question. When we first started out, we would put all of the drives into a storage pod, and we’d burn it in, so to speak; we’d run it for a week or so. We still do that to a degree, but that burn-in period’s a whole lot less. But when we replace a single drive, we don’t burn it in, if you will. They put it in, and it obviously has to run through a series of [unintelligible 00:38:04.13] and so on, in order to even – you know, did it come up? What does it look like? What does the SMART stats look like? And if it passes all of those basic things, then it’s in service.

I think one of the things that’s really helped us with that over the years - my goodness, it’s probably been four or five years now… I was at the Seagate facility in Longmont, Colorado, where they do their prototyping for all of the drive builds, and so on and so forth. And one of the things that they do - and they do it at all of their factories at some point - is once the drives come off the line, so to speak, they actually put them in a testing box, and they run some tests on it for a few hours, days, whatever their period of time is. And you can see that when you get a “brand new” drive, it has power on hours, 16, 20, 25 whatever. So it’s not zero. So they did some banging on it, to make sure you don’t get a DOA drive. And so I think that has helped. And I’m relatively sure all the manufacturers do something like that, although Seagate’s the only one I’ve actually ever seen.

Yeah. Well, that’s my drive of choice. I use Seagates. I was on IronWolf for a bit there, and then IronWolf Pro, in some cases, I think mainly for the warranty that you get with the pro label; you get a longer warranty, which is nice. Not necessarily a better drive, but definitely a better warranty… And then my newest drive I’ve gotten from them was the – I think it’s called the Exos. I’m not sure how – do you know how to pronounce that, by any chance?

That’s as good a chance as any. I’ll go with that one.

Yeah, sure.

Ex Os. [laughs]

Ex Os, there we go. We’ll call it Ex Os then. I think that probably sounds better.

[39:45] I think that’s the ones we actually use as well. Yeah, so it’s interesting… We trade off, and we have an opportunity to do something which I’ll say Joe consumer doesn’t have; we can trade off failure rates for dollars, right? And I’m not going to pick on any drive or manufacturer, but if a particular drive has a failure rate that sits at 2%, and a different drive has a failure rate of 1%, we look at the cost and we can say, “Well, wait, the one with 2% cost us $100 less.” And the lifetime cost of that, replacing these drives over a five or seven-year period or whatever it is, we’re going to save half a million dollars if we just buy those. So we can do that. And people at home with one drive don’t really have that – maybe that’s not the decision they want to make. That’s why we always tell them, “Hey, there’s this company that backs up things. Maybe –” But anyway.

That’s right, BackBlaze. Yeah. So that’s cool though, that you get to do that kind of trade-off; as you said, dollars per failure, things like that… I think that’s really interesting. Do you have some program or formula that you use behind the scenes, that lets you make those choices? And then too, I guess, when you’re buying the drives, can you use your data as leverage? “Well, hey, [unintelligible 00:41:05.02] based on our stats in the last 10 years, your drives fail more often, so we want to pay less for your drives, because we’ll have to buy more of them sooner. We’re happy to keep buying them. However, they’re going to fail more often, and more frequently, based on our data.” Does that give you leverage?

So I’m not the buyer, but I do know that the data gets brought up from time to time in conversations with the various different companies. Now, inevitably, they have a different number; they always do. They publish it on their data sheets. And every drive I’ve ever seen has either a 1% annual failure rate, or a 0.9% failure rate. So that’s, the rage, it’s like 0.9 to 1. And so that’s what they believe is their number, and they do that through calculations of meantime between failures, and so on and so forth, of all of the components. So that’s the number they believe. Now, whether or not we influenced that, and we say, “Well, look, we’ll go buy these, and we’ll do this trade-off” - you never know what numbers you’re going to get from a manufacturer at a given time.

The other thing that we do is I don’t need the latest and greatest drives in my data center, because why would I overpay for them? So we’re going to use drives that are somewhat down the price curve, and have a proven capability in the market already. And so you’re better off negotiating from a point of view of where you are in that price curve, your drives fail more or your drives fail less, kind of thing. One. And two, model by model is so much different; you may get one model of a 16-terabyte drive that, let’s just say Seagate makes, and its failure rate is 0.5%. It’s great, half a percent. And then you may get another 16-terabyte drive from Seagate, and it fails at 2%. So what do you do, right? You just negotiate based on where they are in the curve. That’s the best thing to do. If you’re going to buy 22-terabyte drives right now, you’re paying a pretty good premium for. So I don’t want to buy 22-terabyte drives right now. I’ll wait until the price comes down, and then we can buy 22s, or we can buy 24s, or whatever the case may be, and we’ll know a lot more about the history. So we’re buying into the pricing curve as it drops.

Can we talk a bit about your storage pods themselves? I know that there’s some history there from Protocase, which I’ve read up on the history because I’m a 45 Drives fan, the brand 45 Drives, so I kind of know some of the storage pod history, where you all had a prototype, and a desire for a design, you went to Protocase and collaboratively you came up with what was Storage Pod 1.0. I think you even open sourced the design, and stuff like that, so people can go and buy it, which was what really drove a lot of the early demand for the 45 Drives spin-off from Protocase to become a legitimate brand… And then there were needs of folks who wanted high-density storage, but not Storage Pod BackBlaze version, because you had different circumstances and different choices you were making, because you had different business metrics you were basing choices off. Like you said, you didn’t want the latest/greatest drive, you wanted something that actually proved itself in the market. You had different demand curves that you were operating on, so you were not the same as everyone else.

[44:30] Long story short, give me the story of the Storage Pod. Helped me understand the design of it, so to speak, like - 15 drives, 30 drives, 45, 60… I know that they’re 15 per row; I don’t know what you call them, but… Give me the layout of the Storage Pod. Tell me more about it.

Sure. So the 15, just to start with, is actually the size of the first array we used. We used array six when we first started. And so we did it in a – I think it was a 13+2 arrangement, and so 45 just came from three rows, effectively. Now, we actually – just mechanically, we didn’t lay them out like an array in each row. We actually moved them around. And that had to do with the fact that the backplanes that we use were five drives each, and so you didn’t want to overload a given backplane with all of the commands going on. So you wanted to move it around; it was just a whole lot more efficient that way. It also had to do with the fact that if you lost a backplane, you would lose five drives, and suddenly that array, you couldn’t get any data out of it. So it was a way to improve durability.

But what we started out building those - and you’re exactly right, we had a design, we sketched it out in our head. Actually, we built it out of wood, okay? And in someplace, in a blog post somewhere, there’s a picture of a wooden storage pod, with the slats, and everything… And we built it out of wood. and we said, “Hey, we don’t know how to bend metal, we don’t have to do anything.” But what we understood was that the design would work. Because before we built it out of wood, we actually plugged together a bunch of five-unit Drobo-like systems, and did all of this analysis, and said, “This will work. And if we do it right, we can get the cost down.” Because if we were going to use, for example, even at that time, S3 as our backend, instead of doing it ourselves, we couldn’t offer our product at the price point we wanted to; we would actually have to 10x it. So rather than getting unlimited storage for your PC for five bucks a month at the time, you were going to have to pay a lot more. So we decided to build our own, and design our own.

And then we went to the folks at Protocase, and I don’t know how we found them, to be honest with you, but they helped build that. And they’re really good at it. They really understand how to bend metal, and they can produce all of the designs… And that’s exactly what we did. And then we turned around and said, “Okay, well, this is great, and we like it. Let’s open source it. Let’s tell the world about this.” And that’s what we did way back in 2009 now.

And then we changed the design over the years, and added some things… But to your point, at some point the folks at Protocase said, “Well, this is really cool. Maybe we should be making these for other folks.” Because we had made the decision that we wanted to be a software company, and not a hardware company. And people were asking us to make storage pods for them. And we went “Well, there’s like nine of us who work here. I don’t think we really can – we don’t have a lot of time…” [laughs]

“That’s not our business model.”

And so “No, we’re not going to make it.” Now, the truth is we actually did make a few of them, because somebody was going to give us a whole bunch of money for them, who shall remain nameless… And so we took the money and made a couple of storage pods, but that wasn’t going to be our business.

[47:58] And Protocase stepped forward and said, “Well, I think that’s a really cool idea. Maybe we should start doing this.” And that’s where they kind of started. And then they could customize them. We needed them for a very specific purpose. We used commodity parts in them; when we published it, you could build your own. You can go down and buy your own Intel cards, and your own super-micro motherboards… And the only thing you had to do that was funny was the power supply cable had to be made, because it went to two different power supplies that came into the motherboard. But other than that, everything else was basically do-it-yourself. Even the backplanes at the time you could buy. So it was really, really cool that they could do that.

A lot of folks, once we published it, actually started building their own storage pods, which is even cooler, right? But the 45 Drive guys took it and they said, “You know, if we could let people customize this… So maybe we’ll produce some different versions of it. Let’s make a really fast version, yay!” and they could upgrade it. And that’s where they started to differentiate themselves.

Then they went to direct connect drives, instead of using backplanes. And I don’t know exactly when they made that decision, but that’s kind of where we parted with them… Because they wanted to go down the direct connect drive place… Which was great, and I think to this day, that’s the technology that they use. And we stayed with backplanes. And so we eventually went and used other manufacturers.

These days, to be quite honest with you, we actually buy storage pods, if you will, from Supermicro. And they are Supermicro servers; they’re not ours, they’re not even painted red… [laughs] And we just buy them off the rack, so to speak… Because they give you the ability to pick a model and put a few options on it, and we say “Give me 500 of those.” And then they ship them, and we’re happy as clams with those. We don’t have to worry about storage pods, and updating the design, or anything like that.

And the 45 Drive guys - they’re doing great. I like them because they’re the true customization place. You can go over there and say, “Hey, I want one of these that kind of looks like this, and paint it blue. And oh, by the way, I like that piece of software, so let’s put that on there, put our clone on it” etc. And you get what you want, and then they support it, which is even better… So cool.

I think it’s interesting, because I have an - AV15 is what they call it; that’s the model number for their Storinator, ten feet to the left of me over there, with 15 drives in it. And so mine is AV15. That’s what the 15 is, it’s a 15-drive array. It’s based on this original storage pod that you all started out with. I think that’s just so cool, how - you know, I never knew you, I didn’t know the full Backblaze story. I came to learn of 45 Drives; I was trying to build a high-density storage array for myself for our own backups, and a bunch of other fun things, and just a home lab scenario… And it’s just so cool to have a piece of hardware over there that’s based on early ideas of you all. And now you’ve actually abandoned that idea, because you’re like “You know what? We want even more commodity. We had a great design, sure, but we actually just wanted to get it directly from Supermicro, and just literally take it from there and put it into our racks.”

Now, can we get into your data center a bit? …because I’ve gotta imagine these things get pretty heavy to lift. I read the blog post that you had up there, which was kind of a behind-the-scenes of your US East data center. And I actually just noticed this; I’m glad you mentioned the change of heart, when it comes to your storage pod, that you no longer use a custom version for yourselves, that you just buy it directly from Supermicro. So it’s still a 4U server, which is great size for that… And you have them stacked 12-high in a cabinet, and you leave 4U at the top for a 1U core server, and an IPMI switch and interface. Can you talk to me about that design, the entire stack? How much work did it go into designing that 12-high cabinet, for example?

Well, the first things you have to start thinking about obviously are how much space it is. But the next thing you have to think about is electricity, and how much electricity you can get to a rack. Because let’s face it, you’re spinning that many drives, it takes up a little bit of juice.

[52:18] Yeah.

And so some of the earlier places we were in from a data center point of view, they said “Okay, so here’s a rack, and here’s 25 amps. Have a good time. And oh, by the way, you can only use 80% of that.” And so you suddenly go, “I can only stack storage pods so high”, especially as the drives got bigger and started soaking up more and more electricity. And so now you go, “Well, I can put 4-terabyte drives here, but I can’t put anything with 8…” But that’s changed over time, as people actually realized, one, that these things use electricity.

So you go into a facility like that, and you say, “Okay, so do we have enough – how much electricity we got? Okay, we have plenty. Great.” For the drives today, the drives tomorrow, and so on. And then it becomes a floor layout issue; how do you optimize the space? How much air cooling do you get? Because these things will definitely produce a little bit of heat.

So you could put all the racks really, really close if you wanted to, but then you’re not getting the proper circulation, and it’s really difficult to do maintenance, and all of that. And there are a lot of really smart people out there who kind of specialize in that.

Once you decide on where you’ve got to put them, then it’s not only your storage pods, but all of the ancillary equipment, the other servers that go in. For example restore servers, or API servers. So now that we do the S3 integration, or the B2 storage piece, we had our own APIs. Now we also support the S3 API as well. They don’t work the same. So when you make an S3 call, it actually has to kind of be turned into our call on the backend, and we had to have a group of servers to do that. And so we have those kinds of servers.

And then you have just utility boxes, and monitoring systems, and so on and so forth, that all have to be built into that. So we may have an entire rack of nothing but support servers. The architecture is such that you have to know where all of the data is. And so we have servers in there, that’s their job; they know where the data is, which storage pod it is, and so on and so forth. So you go and say “Hey, I would like to have a file”, and you ask that device, assuming you’ve been authenticated etc. And it says, “Okay, you’ll find it over here. And here you go. Have a good time.”

And the same thing when you want to write something, okay? The way we write things is pretty straightforward, which is we literally connect you to a tome, actually, to a particular pod in a tome. Once you say, “Hey, I have some data and I want to write it”, and you say, “Great, here’s where you’re gonna put it.” And you’re right there. And then we have to record all of that, of course, and keep track of it for future reference, so you can get your data back.

So that whole process of laying things out - like I said, the biggest one starts with what’s your footprint, and then how many racks can you get in there, how much electricity can you support, how much cooling is there, and so on. And then of course, you just have to deal with the fact that these things are big.

So going up is really, really cool, because if we can get it – the only issue ever became one of “Does the lift guy go high enough - good old Luigi there - so that we can get them out, so we can get them back down?” What do we have to do? If I have to bring a ladder with me every time to serve a storage pod, maybe that slows me down a little bit.

If you lift it…

They are heavy, but yeah, you can get on the lift… [laughs]

Well, I mean, even my 15-drive array, if I have it fully stacked to put it back in the rack, or to pull it out with – and it’s got rack rails. I mean, it’s heavy. I didn’t weigh it, but it’s effort. It’s not like a little thing. It’s thick, and it’s just 15 drives. Now, if you get 60 in there…

[56:10] Yeah. And they come bigger; you can get them as high as – I think I’ve seen 104 now in there… So with 60 - yes.

You don’t want to drop it either, right? I mean, that would be the worst thing ever.

No, you don’t want to drop it. When we first started the company, myself and Yev, who’s one of the other guy’s at marketing - a bit of a character - him and I used to go to local trade shows and stuff and we’d bring a storage pod with us. But we only brought it with five drives in it, because quite frankly, we had to lug it like through streets, and upstairs, and all kinds of things like that… [laughs] So yes, they do get quite heavy, and that’s why we have the rack in place. And no, we don’t let people cart them around, and all of that. But yeah, we do want to optimize the space.

But we do need to get in them from time to time, to replace a drive… So you don’t want them to be right at the top of the rack, and so you put in some of the other equipment which doesn’t require as much hands-on maintenance up there.

So a 52U server rack, you’re stacking them 12-high; they weigh roughly 150 in pounds each, 68 kilograms… Roughly, just assuming that. And then to lift that - I think in the details here in your blog posts is Guido.

Guido, yeah.

Guido is mentioned. And I think that’s like a server lift, it’s like a thing. How did that come about?

So that started with the very first ones at 45. Our first rack that we built, it was like a half-height rack, and it only went up four. That was our first setup. And as soon as it went higher than four, people went “This is really heavy. We need to find this out.” So you can get a server lift, and that’s what we did. We actually had to raise money, way back when, to buy a server lift, because they’re not cheap. And that was Guido, who was named after the server lift in Cars, by the way; the movie Cars. And then later on we added Luigi… I know all of the data centers have their own. I don’t think the rest of them have funny names for them, although I’ll have to ask, I guess…

Yeah, I was thinking that was like the name of that one. Was it Luigi, the character that sold the tires, and Guido was his sidekick? Is that correct?

I think so. It’s been a few years since I watched the movie.

I like that though. That does make sense. Yeah, okay. So yeah, I’m looking here quickly… Guido was the kind of blue lift-looking thing, and I believe Luigi was the Ferrari lover. The Italian.

There we go. Yeah.

Okay.

So that was our buddy, my buddy Sean, who ran our data centers for a number of years before moving over to another job within Backblaze… But he was the one who named those things, so he has a bit of a sense of humor.

So we’ve kind of touched a little bit on this to some degree, but tell me - here’s two questions I want to ask you, or that I want to get to at least. I want to know how you all buy hard drives, and then I want your advice for how consumers should buy a hard drive. So we touched on it a little bit, but give me a deeper detail. How do you choose which manufacturers – it seems based on your data you have four that you choose from. I believe HGST, Seagate was one we talked about already, Western Digital, of course, is always in that mix… And then I think sometimes Micron’s in there. It depends if those are the SSD stacks, but…

Toshibas are the fourth.

Toshibas. Okay, so you primarily map around four different manufacturers. How do you – like, do you buy them in batches? Do you have a relationship with the manufacturer? Do you have to go to distributors? How does it work for you all to buy? Like, how much of a lift is it for you to buy drives? And when you do buy them - I’m assuming it’s once a quarter, because you’re tracking failures at once a quarter… How often are you buying new, and how many do you buy when you do buy them?

So it’s actually a very variable process. And HGST, just to kind of fill in the gap there - HGST as a company got bought by Western Digital. It got split up by Western Digital and I think Toshiba years ago.

Yeah.

And so we have HGST drives, but they’re historical in nature. And so now we deal with WD, Western Digital, to get what effectively are HGST drives. But the process is you maintain relationships with either the manufacturer directly, or the manufacturer will send you to a distributor. You almost never buy directly. We don’t buy directly from the manufacturer. You always buy through a distributor. We always buy through one. Maybe Google, or somebody like that, goes and can change that. But companies of our size, we’ve always bought through a distributor. It’s just the way it works. That’s where the contract is with, and so on and so forth.

We don’t buy drives – well, originally, we used to buy drives as we could afford them… [laughs] But those days are over, and now we buy them based on – first thing you want to do is what are your needs, your storage needs out over, let’s say, the next year and a half, two years, and how much do you think you’re going to need, how much growth and storage. And then you start dividing by where you are on that curve. Remember, we talked about that earlier.

So if I’m trying to buy something, I want to buy something in the middle to the bottom end of the curve. But sometimes you can’t get quantity down there through a distributor. So you have to – it goes back and forth. We also say – let’s decide that we’re going to get 8-terabyte drives, and we want to buy 5,000 8-terabyte drives. We’ll go out to the manufacturers - or the distributors in this case - and say, “Hey, we’re looking for some of these. We’re looking for, 5,000 of these 8-terabyte drives. What have you got?” And they’ll come back with “Well, I don’t have that model, I have this model; it’s an older model (or it’s a newer model), and I can sell you not 5,000, I can sell you 7,000 at this price.”

[01:06:26.22] So you get all of these things that come back, and you negotiate back and forth, until you finally get to somebody or someone that you can buy from. And you place the order. And the order becomes one of – so how often do you do it? We like to buy them so we have a nice cushion. But if you buy so many at a given price, and six months later they’re down 20%, that’s extra money you just had basically sitting on your data center floor. So you want to be efficient in how you buy them, but you always want to have a buffer. And a good example was the supply chain problems that happened over the pandemic… And we had that buffer.

So the first thing we did is as it started to look like things were starting to get tight, is we placed a bunch of orders for a bunch of equipment; not just drives, but all the support equipment, and everything like that. But we had a buffer in place. And as prices went up - because they did - we were unaffected by that, or minimally affected by it.

So it really is just a matter of what’s available… We know what we need; we ask the manufacturers, “Hey, this is what we need, and this is the timing we need it in.” They come back with the bids, basically, and say, “We can deliver this here, this many, at this price, at this time.” And that’s also important. So just-in-time manufacturing, or just-in-time warehousing, whatever you want to call it, is part of that math that goes together. And sometimes manufacturers/distributors don’t deliver. “Hey, I was gonna get you 3,000 drives by Friday. I can’t make it. I don’t have them.” And at that point, that’s why you have a buffer. And then you have to make a decision. “Well, okay, when can you have them?” “Well, I can have them in 30 days.” “Okay, that’ll work.” “I can’t have them for six months.” Then you’d better find a substitute.

And you want to maintain good relationships, of course, with all of these folks. And I think we do have good relationships with all of them. The Seagate one has been a solid one over the years; the Western Digital one has gotten a lot better over the last three or four years with them. And Toshiba has always been a pretty good one for us; we meet with them on a regular basis, so they understand our needs, and can help us decide what to do next, because they’re part of that. They may have something coming down the road, that we’re not aware of, and they go, “Hey, we have an overproduction of 12-terabyte drives out of this factory in here. I’ll tell you what we can do.” Those kinds of things come up from time to time.

For sure. How do you determine – it may not be an exact answer, but how do you determine 8-terabyte, 12-terabyte, 16-terabyte? Is it simply based on cost at the end of the day, or is it based upon capability? How do you really map around the different spectrums? Is it just simply what’s available at the cheapest, or that curve? Is it always about that cost curve?

That’s what you want to start with, but it’s not only about that.

Do you limit it within that range though? So anything above that curve, it’s like “That’s out of the question, unless there’s a reason for it”?

We bought some new drives way back – I remember the time we did it. We bought some – I think it was 12-terabyte HGSTs or something at the time, and they were almost 2x what we would normally have paid for that drive. So we do it from time to time, if it matters from a timing point of view, or something like that.

[01:10:03.24] We also do it from an experiment point of view. “Sell me 1,200 drives.” That’s a vault. And we’ll pay a little bit extra for it to see how they really work. Do these kind of meet our needs, for example? You also do it a little bit for goodwill… [laughs] There’s some of that still out in the world, and do that.

And then the other side of that, the flipside of that is somebody may come back and say, “Hey, I have a bunch of eights. We’re at the bottom of the curve. Basically, here, they’re almost free.” And you buy them, and use them for substitutes, or something like that; or you maybe use them for testing purposes. We have a mini-setup for a vault that we use for testing purposes and testing software, and sometimes they go in there.

So there’s all of these different things that play into that decision. The logical thing to say is “Well, always give me the biggest drive, because that’s the best, most efficient use of space.” And that’s important. But all of the other things start to factor in, like “Well, that 16-terabyte uses four times the electricity of that 4-terabyte. Wow. How much is that gonna cost us?” Or it produces so much more heat. Or when we use it, it’s slower, because it’s such a bigger drive; it’s not as fast, it doesn’t give us the data quick enough. And I’m using that as an example.

Right. Even though it’s a 7200 RPM drive, it’s still slower on the data out [unintelligible 01:11:33.16] is slower.

[unintelligible 01:11:34.04] is slower. So you trade off those kinds of things as well. The other one which most people don’t recognize is when you get into this whole game of rebuilding a drive. I can rebuild a four-terabyte drive in a couple of days.

Way faster. Yeah.

Right? What does it take me to rebuild a full 16-terabyte drive? Weeks. So does that change your durability calculations? What do you have to do in order to make sure that you’re staying at the level you want to stay at for that?

Well, something you just said there made me think about saturation. So while you may use a 16-terabyte drive, is there a capacity limit? Do you go beyond a 50% threshold? For example, if you have an array of 16-terabyte drives in your tome… I assume a tome is one single pod. Or is a tome –

It’s actually spread across 20 pods. It’s one drive in 20 different storage pods, yeah.

Okay. So given that, do you fill those disks fully up before you move along? Do you operate at full capacity?

It’s a good question. We do run them at above 80%. And the reason has to do with the fact that there’s a notion of filling them up and then – so the way our system works is you’re given a URL to the data, to your location, to your particular tome, if you will, particular drive. So we fill those drives up to about 80%, and then at that point, there are no new people coming in. What happens then is that existing customers, they say, “I have some more data for you. I have some more data for you.” And they continue to fill that up until we get to - I think it’s like 90%, 95% or something like that. At that point then we hand them off and we say “Go someplace else. Here’s a new place to write your data.”

So we have this process where we get to where we slow it down, slow it down, stop writing new people, let the existing people write into there to fill it back up… Then we also have a whole mechanism in place for data recovery, space recovery. Because we don’t charge extra for any of that kind of stuff, because we use PMR drives, or CMR drives. That’s just the normal process. Deletion, and reuse of space is an easy thing. It’s not like an SMR drive, which is expensive to do that.

[01:13:59.10] And so we delete data and recover the space and reuse it… So maybe we get to 95%, but then people delete files, and we come back down, and then we can add some more, and so on and so forth. So that seems to be about the right range for us. But they are definitely chock-full of data.

Yeah. So the point that you’re making there’s I may have – the reason why I asked you that, to get clarity, was because I may have, in my example, an 18-terabyte drive in an array… But that entire array, or that entire VDEV is not full of data. Like, every 18-terabyte drive is not chuck-full, because that’s not the way I’m using it. Backblaze is way different; you’re trying to be inexpensive storage that’s reliable, and easy to access, fast to access etc. Fast to backup to… Then you also have your B2 service, and a bunch of other reasons of how you use your hardware. But my use case is different. So now, dovetailing into the way you buy drives, which is very unique and very different… I don’t have a – I guess I’m at the whim of the retailer. So I would typically buy from B&H, maybe Amazon, maybe Newegg, maybe CDW, for example… These are places I might go to buy consumer-level hard drives. And I’m buying six, eight, maybe ten, if I’m really feeling lucky. Maybe if I’m buying for the full platter, for the full range of 15, maybe I’m buying 15 plus a couple for parity, to have replacement desks. But even then, that’s super-expensive for someone like me… Not someone like you, because you buy, 5,000 8-terabyte drives at a time. Massive check, right?

Yup.

Me - way different. Or the people like me - way different. So let’s dovetail into how can you leverage what you know about the way you buy drives, to give some guidance to consumers out there that are homelabbers, that are building out 4-drive arrays, 6-drive arrays, 12-drive arrays, whatever it might be. Give me some examples from what you know, with all these drive stats, these 10 years of drive stats, to how you buy drives. What are some of your recommendations for consumers or homelabbers when buying hard drives?

So that’s a really good question, and it does come up… And you’re absolutely right, somebody with a handful of drives, or a small number of drives has to think differently. And I think one of the reasons why the data, what we do has been popular, if you will, for the last number of years is because there’s such a dearth of information out there.

Other than that, you go to the manufacturer, and you could take every data sheet produced by every single manufacturer and just change the name, and they look identical. And they almost have the same numbers on them.

For sure.

And so they’re of very little use from that perspective. But there are some things you can do as a consumer. One is you can – manufacturers try to match the use of the drive to the mechanics inside of it a little bit, and the firmware that’s inside of it, and so on. And so you might look at that. So if you’re doing video recording, you’re just recording your video systems or something like that, that’s a different use case; then you might be using it where you’re storing media on it, and you want to access your movies, and you’ve created a Plex server, or whatever the case may be. Versus Joe, person who’s looking for an external drive because they have a computer and they want to put some data on an external unit.

So I think what we give people from our perspective is at least data to help make a decision. Now, where else do you get it from? There’s various sites that do produce it, there’s folks like yourself, who work in a home lab thing and say “Hey, I had success with this.” And I think you need all of that information in order to make a solid decision. I don’t think it’s a monetary one, although I completely understand why people make a monetary decision. You know, “Gee, I can buy that for $179, and that one cost me $259, and they’re the same size. And I don’t really have $179, much less $259, so I guess I’m going to buy that one.” So I understand those decisions, and you cross your fingers from that perspective.

[01:18:18.22] The other little thing - it’s just the wild card in all of this; you never know when you’re gonna get a really good drive model, or a really bad drive model. And you could buy a drive, and it’s the, let’s just say DX000 model, right? And you bought it, and it’s been great, it’s been running for years, and your friend says, “What do you use there?” and like “Oh, I’m using the DX000.” And he goes “Great.” And he goes to the store, and he can’t get that, but he can get a DX001. Pretty close, right? And it fails three days out of the box. [laughs]

So you have to be somewhat precise, but you also have to get – you also see the personalities of the different manufacturers. And I’ll go back to Seagate. Seagate makes good, solid drives, that are a little less expensive. Do they fail more often? Maybe. But there are certainly some good models out there. And it doesn’t necessarily correlate to price, by the way. We’ve seen that. And it doesn’t correlate to enterprise features. It seems to just be they made a really good model.

The other thing I would do is if you’re buying a drive, I would buy a drive that’s been in production about maybe a year, maybe six months at least, and then look and see what people are saying on websites, various consumers. Don’t go to the pay-for-play review sites, because you just buy your way up the list. But “Hey, I’m thinking of using this particular model”, and then pay attention to the model they’re using. And then when you go to buy it, make sure you get the same one, because again, they don’t have to be the same.

Use our data wherever it’s helpful, to help maybe guide you a little bit towards where you might find the right ones, maybe wants to stay away from a little bit… But at the end of the day, that’s just one piece of the information that you’re going to have to dig up. And there just isn’t a testing agency for drives out there.

You all should be that. [laughs]

We get people begging us for that… We have people literally saying –

Spin off a division, or something like that.

That’s right. Wouldn’t that be fun?

Yeah, I mean, realistically… I mean, you’ve done a lot of the hard work in quantifying the value of the data, and you’ve been consistent with the ability to capture it, and then report on it at a quarterly and yearly basis, which I just commend you on. Thank you for that. And you give it out for free. you don’t say, “Hey, for Backblaze customers, you can see the data.” It’s free for everybody to see. And I think you even have like downloads of the raw data, if I recall correctly. I didn’t know what to do with it, but I’m like “Great it’s there.” If I wanted to dig into it further, then I could. But yeah, there should be some sort of drive testing…

But what a hard thing to do. I mean, especially, as you probably know, models change so quickly, and the model numbers don’t seem to be like there’s some sort of rhyme or reason to them; they just seem to be like “Okay, we’re done with building that one, and now we’re going here.” And it’s also based on geography; it may be made in Taiwan, it might be made in Vietnam, it may be made somewhere else… And these things also play a role into it. It could have been something geographical in that area; there could have been a storm, there could have been an earthquake, or a hurricane, or something catastrophic, or who knows what. There’s things that happen in these manufacturing plants when they make these drives to get consistency.

[01:21:45.12] I’ve even heard to buy not in the same batch. So don’t buy more than x drives from, let’s say B&H. Buy two from B&H, two from CDW… Obviously, buy the same model, if you can, to try and keep the model number parity… But I’ve heard all these different – essentially, old wives’ tales on how to buy hard drives as a consumer. And really, it seems to be cargo-culted, or learned, from somebody else, or just fear, essentially. “This is why I do it, because it’s a fear.”

And the way I’ve kind of done it is based on the capacity, first. So I think, “How big do I need?” So I begin with my capacity. because I’m different. I want to get to price curve eventually, but my deal is “How much do I want to have? How many drives can I actually handle?” and then at that level, what’s my parity level? Can I afford to have a couple extra, so if those two fail in that parity, let’s say a RAID-Z2 given a ZFS file system array, as an example… If those two drives fail, can I replace them? Do I have two more drives to replace them if two did fail?

I hadn’t considered your cloning idea, which I think is super-smart. I’m gonna have to consider that. I might just do some hard drive failure tests just to see how that could work. That seems so smart, to clone versus resilver… Although I don’t know how that would work with ZFS, if that’s a thing or not. But capacity is where I begin. Then it’s like “Okay, for price, did I get that?” And then the final thing I do once I actually get the drives - I hadn’t considered running the SMART test right away to consider how many power-on hours it had, because I didn’t consider they’re doing tests in there… But I thought, “Well, hey, if Seagate is doing a burn-in of sorts on my drives, or some sort of test beforehand, let me know.” I would buy a model that has burn-in testing beforehand. Save me the week, if I’m gonna burn-in an 18 terabyte drive.

So when I bought this new array recently, the burn-in test lasted seven full days. I don’t know if you use this software now, it’s called badblocks… But you can run a series of tests, it writes three different test patterns, and then a final one, which is the zeros across it… But for each write, there’s a read comparison. So it’s a write across the whole disk, in one pattern, then a read, another write, then a read, another write, then a read, and then finally, a zero pass write, and then a recomparison to confirm that the drive is actually clean. But for an 18-terabyte drive, six drives, it took an entire week. And that’s just a tremendous amount of time for somebody who’s like “I just want to get onto building my thing… Come on now.”

But that’s the way I look at it. Like, that’s how I’ve learned to buy, is like “What capacity do I want to have?” And then “Can I afford it?” Just the drives alone. And then “Can I afford the extras if I need parity and replacement for that parity?” Of course you want parity. And then finally, doing a final burn-in before actually put the drives in the service… Which I feel like is a little overkill, to some degree… But you know what? The worst thing to do is to build this full array. I’m not a business, I have limited time… And then I’ve got to deal with failures a week or so later. Now, that burn-in test may not predict a week-long later failure, but it might mitigate it, because like, well, if drive four of six did not pass the sectors test in badblocks, well then, let’s send that one back for an RMA, or just a simple straight-up return kind of thing. And you know before you even build the array, you’ve got a problem child, essentially.

And the other thing is… Running that kind of software, if there is a media error - which happens; it just does - and yet having that drive rebuild around it, and so you don’t even know it, other than it might tell you that… But if you put your system in play before you do that, and it finds it, the same thing can happen, but now your system runs a little slower for a period of time until it figures out how to map around that.

For sure. The only other thing I want to talk to you about is I think it’s a newer player to your drive stats, which is SSDs. Now, I think you only use them as boot drives, not as like storage drives in your B2 service, or your at-large data services… And I think the reason why you made these choices is you’re very pragmatic with the hardware you buy. Like, only buy the things you need, and you keep, I guess, your expenses, or your costs of goods sold low, because you want to have the lowest cost storage services out there, whether it’s B2 or whatnot. That’s how I understand of Backblaze’s business model and thought processes when y’all spend money.

[01:26:22.21] So with SSDs, obviously you’re replacing older hard drives than maybe the boot drive, which, as you know, the boot drive is the drive that’s running the operating system itself on the thing. Now, I’ve gotta imagine this 52U array that you have, or this 52U rack you have - you’ve only got one server in there, but you’ve got… What was it, eight? Eight storage pods, and then you’ve got one actual server. So is all that hooked back to that server? And then tell me about the SSDs.

Yeah, so actually - well, just to kind of set the thing… A storage pod is actually more than just storage; it’s actually its own storage unit. It’s a server. So there is a CPU in there, there’s all of the other components, and everything like that. So it’s not like a [unintelligible 01:27:05.14] kind of thing…

Okay.

…which - each server is its own server unit. It’s got its own intelligence, its own processor, its own 10G network connections, and whatever else. And so each one has its own boot drive as well. So that’s where we get them all from.

The boot drive for us does more than just boot the system. Like I mentioned earlier, it stores some of the smart stats on it for a period of time, it actually stores access logs and a few other things that we keep track of on a regular basis… Because one, there’s a compliance aspect to doing that, and then two, there’s just a maintenance aspect, and a debugging aspect. When something goes a little wonky, you want to be able to look through various logs. So we keep those for a period of time. And they get stored on that SSD as well, or the boot drive as well.

The SSD is - to be honest, we started using those because the price point came down to the point where we wanted to pay for it. [laughs]

Yeah. Performance probably made sense too, and then price made sense.

Yeah. And we’ve tried different ones over the course of it, and the data. We’ve talked about building a storage pod out of SSDs. And in fact, some of our competitors are even talking about and doing some of those things. The cost point just doesn’t make sense yet. And the reality is the speed, which is what most people would think they would be getting - it’s not the drive where any slowness happens. It’s not even quite frankly in our systems. I mean, we’re dropping 100-gigabyte NIC cards in these things, right? [unintelligible 01:28:45.00] And a lot of it is it just takes a while to get the data from you, even just to next door. Forget about getting it – and so the SSDs are a pretty cool idea, and I guarantee you when the price point gets to the right spot, we’ll do it.

Backing up somebody’s data, whether it takes 10 milliseconds, or whether it takes 12 milliseconds is not an interesting thing. And you shouldn’t have to pay a premium to get that last little tiny bit. And to your point, that’s where we live. We want to provide a solid, good, well-performing service at an economical price. That’s what we do. SSDs don’t fit into that as data servers at this point. They’re still too expensive. And the use cases could be interesting… The read/write, the number of writes, and stuff, could be an interesting – do they were out under that environment? People have been using them in what we call caching servers in order to do things, and the reads and writes on those are enormous. So you could literally burn through those servers and those SSDs in six months. So is that economical? Did you really gain anything from a cost perspective? No, you didn’t. [laughs] Versus using a hard drive, which is going to last for three or four years doing the same kind of a thing. And at the end of the day, was it really faster? Did it really improve performance?

[01:30:20.12] And so those things still – the analysis for all of that is still ongoing, from our perspective. But I can see a day when we’re there. I can see a day when we’re using something besides hard drives to store customer data. But we will do it in a way that’s economical and practical.

Yeah. You said that sometimes you swap out 8-terabyte drives, so I’ve gotta imagine the largest SSDs out there tend to be 4 to 8 terabytes. But if you compare the costs to an 8-terabyte hard drive, it’s probably double; especially the e8-terabyte SSD is probably at least maybe four times the cost of an 8-terabyte hard drive, so… I mean, yeah, I’m not going to – when I buy Backblaze for backup services, or even B2 services, for example, which is like a similar S3 replacement, and you even support the API, as you’ve mentioned before… You know, I’m not looking for the caching and the speed necessarily. I mean, I want the speed to be there, but it’s not like “Well, I will pay double for Backblaze, because you give me SSD backups.” It’s just not something I’m trying to buy as a consumer, from my perspective. And that totally makes sense for your business model. That makes a lot of sense.

That’s why I wanted to talk to you about all these details, because the way you buy hard drives, and the way you manage these systems is so unique in comparison. And I mean, we’ve just never examined the behind the scenes of a data center, or a storage data center, like you all operate, like what does it take to build that… I know we’ve barely scratched the surface here. I’ve got probably 30,000 other questions that might go deeper and technical on different softwares, and stuff like that… So we’ll leave it there for the most part, but it has been fun talking to you, Andy. Is there anything else that I haven’t asked you, that you wanted to cover, in your 10-year stat history, anything whatsoever that’s just on your mind that we can leave with before we tail off the show?

Well, I will say the next Drive Stats Report is coming up. That’s always fun. I think it’s due out May 4th. May 4th, yes. That’s Star Wars Day.

It’s my anniversary, too.

There you go. That’s even better. Last year we wrote one up all about on May 4th, and we did it all of it, Star Wars themes and stuff like that… But I’ve dialed that back this year. So maybe one or two Star Wars references and that’ll be it. But congratulations on the anniversary, though.

Thank you.

But yeah, so that’s coming… I encourage folks who do read those things, if they have questions or comments, feel free. We’ll answer them. We try to do the best we can. We try to be open with how we do things, and why we do the things we do. So I always look forward to that. And ask the hard ones; we will give you the best answer we can with it. We are these days a public company, so I can’t – I don’t know how many things we can disclose at certain points, but we’ll do the best we can in getting you the information you’re asking for.

Yeah, I always appreciate digging into this. I don’t always dig as deep as I would love to, because of time, or just too much data, so to speak… Because it is kind of overwhelming. And the way you have to look at even like your drive failures by manufacturers, for example, it’s like “Well, that number may be higher for Seagate, but you also have a lot more Seagate drives and services.” A lot of corollaries you have to look at.

[01:33:49.03] You can’t just say “Okay, let me go to BackBlaze’s data and say “These are the drives I’m gonna buy.” Well, it might be an indicator to a manufacturer, maybe not model or size particularly… But it might mean like “Okay, you seem to favor Seagate. You mentioned that your relationship was pretty good there.” I like Seagate. I’ve had great – I almost switched recently, when I was buying my newest array, and I was thinking about building – I was like “I’m gonna go with Western Digital.” I almost did that, but I’m like “Well, I’ve got these drives in service for this many years”, knock on wood, with zero failures, right? With zero failures. When you say that, things happen, so I’m sorry to say that…

[laughs]

But I’ve been having Seagate drives and servers for as long as I’ve been running data stores, which has been a long time, probably eight plus years, maybe ten years or more; longer than that. 13 years. So over time, I’ve always only ever used Seagate drives. I don’t know why I chose Seagate. Cool name. I liked IronWolf; cool brand name. All that good stuff. They’ve got some good stuff there. But the things I read about them was pretty strong, the warranty was good… And I’ve had good services with Western Digital as well, in terms of warranty; I’ve had some drives from them fail, and have to be RMAed, and the RMA process is pretty simple. That’s what you want to buy it for - you want to buy a brand that’s reliable, you want to buy for parity, you want to buy for replacements of that parity, and to be able to swap it out easily… And then also, do you have a worthwhile warranty and can you RMA pretty easily? RMA is just simply sending the drive back that’s got a failure, and they will replace that drive with the drive that you got, or something that is equivalent. There’s always circumstances that make that different, but… I’ve only had good responses from Seagate, as well as Western Digital. So those are the brands I stick with. But that can be a wife’s tale, right? That’s Adam’s wife’s tale of how I buy hard drives, you know?

It’s okay. People have to be comfortable with whatever decision they make. But the most important thing - and you built it into your system, right? …is to have a backup. And I don’t care what kind of backup it is. You don’t have to use our service…

Well, RAID isn’t a backup, it’s just parity. But yeah, I definitely have a backup.

Because if you lose it, it’s gone. So have a backup. And I again, we’ve said this before - I don’t care if you use us, I don’t care if you use anybody. I don’t care how you do it, just get your stuff backed up, so that if something happens, you don’t lose the things that are precious to you. It’s as simple as that. And again, I don’t care who you do it with, or how you do it, just get it done.

Very cool. Well, Andy, thank you so much for taking an hour or so of your time to geek out with me on hard drives… Not always the – I’m curious how many people come to this episode, honestly, and be excited about this topic. It’s not the most funnest topic, unless you’re a hard drive nerd like you and I might be.

Well, yeah.

I rather enjoy it. I think this kind of stuff is pretty fun. But I’m kind of curious what audience we bring to this one, because this is a unique topic for us to have on the Changelog.

Yeah, I appreciate the opportunity, and I hope some folks listen. It’s always fun to have folks listen to what you say, and then make comments on it, and all of that. There are some places where geeks hang out, and hard drive geeks in particular hang out, so maybe we’ll get a whole bunch of them together, and listen to it. But just the education of what goes on… I mean, you understand the complexity of a hard drive, and what’s going on inside there, right? And I understand that to some degree as well, and it is… It’s miraculous that that thing works, it does what it does, and it does it at the price points that they do it at… So we just need to have that appreciation for the technology, for as long as it’s around.

For sure. I agree. I mean, we definitely under-appreciate and take for granted the mechanics of a hard drive, as simple as that might be. Like, wow. I mean, on my MacBook Pro I don’t care, because I’m using an SSD. It’s actually probably NVMe SSD; or just straight-up NVMe, not NVMe SSD, but it’s in the M.2 format, or whatever it might be. At that point I’m not caring, but in other cases, yes. I mean, that’s what the cloud is built upon. Your cloud was built upon spinning rusty hard drives, that eventually fail. That’s not always the coolest topic, but it is crucial. It’s like almost mainframe-level crucial, right? We don’t think about mainframes too often; we had an episode about that… But how often do you talk about hard drives, and the simple thing that they are, but also the very complex thing they are? And like you said, the miraculousness of the fact that it actually works. But yeah, thanks so much, Andy; it’s been awesome talking to you. I appreciate you.

Thank you, Adam. It was great.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

View all episodes

Player art