Webinar Preview (Part 3): ETL Techniques for Large Datasets
This is a preview. To access the entire conversation and additional resources, join the CURRENT Community for free and join the growing group of leaders advancing the digital supply chain in the electronics industry.
Video Transcript:
Wilmer Companioni:
Alright. So along that same vein, Dave, and and alluding back to something you mentioned earlier about measuring. So I’ve written my ETL pipeline and I’m happy and, but what are the some of the things I should look out for that would indicate to me that things are about to go off the rails.
Dave Antosh:
The number one thing for me is making sure you test that with realistic production data or at least realistic production size data if you wanna use something that’s been sanitized.
If you’re testing with ten items going through, but at the end of the day, you’re doing 2.5 million
You’re not getting a very accurate look at your system. From there, you need to look at the tracking and metrics. If you’re consistently running it, but you see five documents out of two point five million didn’t make it through. That could be indication something is deeper wrong with your system. You wanna avoid those those race conditions that can be very difficult to debug and having good metrics and a solid foundation in the system is a huge benefit and a huge help towards getting you there.
WC:
Okay. And when you say good metrics, that sounds like some post-deployment stuff, like, looking out for increased, latencies or increased error rates or something like that. Right?
DA:
Yeah. And we we have a lot of different versions of those that are consumable in different ways. There are some that appear right in the web app.
Ones that tell you how much is going through, how long are taking. And as developers, we have much more finely grained stuff and, several logs. We’re tracking a lot of different metrics, how much data is moving through on the queue, what the CPU looks like, what memory looks like, all those things to give us an idea of what the final transformation is.
WC:
Alright, Dave. I’m gonna throw you a different one that I don’t think we’ve talked about.
So let me know if it works. If it doesn’t, we’ll just cut it out.
Alright, Dave. So what do you find particularly unique about the data that we deal with here in the electronics industry that provides new challenges. Right? Earlier we mentioned about the breadth and the depth.
But, what about some the diversity of data, things like that that make our challenges here at Orweaver a bit different than typical data challenges.
DA:
Yeah.
One of the major things is where you do get that one of case And we so we have to make sure our data model is very flexible.
There are situations where you think you have everything planned out ahead of time and then somebody comes in and they have an absolute do or die requirement with their data that you never thought of you can’t be two and a half years into designing a data model and have it be so rigid that it can’t support anything. So a certain level of flexibility is built in from the start. One of the challenges is making that performant, making sure you’re your code is reliable that can introduce all kinds of fragility issues. You gotta make sure all the developers are on the same page. Things are well documented, well mapped.
WC:
That’s interesting. So it sounds like you’re saying, in your data model design and in your your transform design, you have to be flexible, but not generic.
DA:
That’s a fair way to put it, and you have to have ways to make change to that model quickly. So we rely on technologies where we can do that. We don’t get locked into a single approach forever.
WC:
Okay. So that’s that’s what allows us to support all the various, data models that exist in the transacting of business in our particular industry. Right? We have, quotes to process and invoices to process and, POs and sales orders, all those are different, but somewhat correlated data models. Right?
DA:
Yep.
And if you don’t take that con we we’ve done the over specified no flexibility approach before, and what tends to happen if somebody has some requirement, you have to put that data somewhere. You they’re not using some other field. You put it in that field.
And then a year later the play forgets where it lands.
Somebody stops working on that project. I forget they did that, and it’s it’s game over.
Yeah.
WC:
Alright.
Thanks, Dave. Alright. I’ll let I’ll let you off the hook now.
So data challenges are everywhere, particularly in the electronics industry. So here at Orbweaver, we apply skill sets of talents and engineers like Dave to bring speed and scale to your data automation solutions for the electronics industry.
So for Dave, I’m Wilmer Companioni, and I’ll catch you at our next webinar.