• Published: May 26th, 2009
  • Comments: no responses

The Implications of Excellent Coding: Part 2 – Real world examples

Tags: , , , ,

In this two-part podcast Rob Steward explains the difference between good and bad coding, and how good coding can benefit your ODBC, JDBC, or .NET applications.

Part 2, which runs 7:05, provides real-world examples of the impact that good coding can make

Click on the following link to listen to the podcast: http://dataaccesshandbook.com/media/RobSteward_GoodCode_2.mp3

Rob Steward:

I’ll give you one other example that I like to use when I’m talking on performance, because it’s a pretty dramatic example. So I’ve been giving talks at conferences and seminars over the last 10 to 12 years now. And one of the tips that I’ve given out for a number of years now is controlling how you do transactions. So most people don’t realize that within all of those standards based APIs that I’ve talked about that you’re automatically in auto commit mode by default. What that means is, when I execute a statement, no matter how I execute it, it’s automatically committed. Now most databases don’t actually have an auto commit mode, but those standards based APIs require that by default, that’s the way you are. So I execute my insert into table, and it’s automatically committed.

Now, as apposed to manual commit mode, which is probably what most people think about when they think about transactions, where you begin a transaction, you do an insert or maybe multiple inserts or updates, and then actually issue the commit. All of these standards based APIs in order to make it easier on the programmers automatically commit everything for you. This is a good thing in most cases, but in some cases, that’s not what you want to happen. Obviously if you want to tie multiple inserts and updates within a single unit of work or within a single transaction, then you want to manually control it. You want to say start the transaction, issue the statement, and then issue the commit. And I think most people understand that.

But the thing that I’ve seen over and over is that you may want to do thousands of inserts or thousands of updates, but most people don’t understand the cost of auto commit mode is. So if we think about it, and think it through, what happens in the driver because of auto commit? If you’re in a database that doesn’t support an auto commit mode, which is most of them, most of them do not have a little switch that says ‘just automatically commit.’ Most of them require that what the driver does then in order to successfully do an auto commit is you execute a statement, what really happens underneath in the driver is it sends the statement across and it also sends a commit. So what that means is that for every statement that you’re executing, there’s actually two that are going across the wire. There’s your statement, and then there is a commit that goes behind it, even though you don’t see that. It’s required that the driver do that underneath you.

One of the things that a good driver or a good application will try to do is avoid network I/O. So now if we say we’re in auto commit and for every insert that we do we incur two network roundtrips. But not only that, but one of the most expensive things that we do on a database is commit. So what happens on a database when we do a commit, is its got all of your changes in memory and when you do the commit, when it responds with a success what it’s doing is guaranteeing that those changes will actually be persisted regardless of what happens next. So once you’ve got a success back from a commit or a rollback, the database is guaranteeing that no matter what happens, even if the power goes out or the machine goes down, that change is persistent. So what it has to do is, when you do the commit, is it has to flush some things out the disk. Not only does it flush them out the disk, but it tends to be non-sequential disk I/O.

So we all know that reading and writing from disk or to disk is one of the most expensive things that you can do, so we try to avoid that. When you do a commit you’re going to write to the disk. But not only that you’re going to write, typically because you’re updating the indexes and other things involved in a table, you’re going to write to very different places on the disk. So it’s non-sequential disk I/O.

The tip that we talk about in the book, and the thing that I wanted to point out on this podcast, is that if you don’t need to be in auto commit mode, then you can save significant amounts of time by switching into manual commit mode. And we go into the book exactly the code you write to do that. It’s also covered in all of the specifications for all of those APIs. But essentially, instead of going insert, insert, insert – you know, I’ve issued my four or five or 1,000 inserts or whatever it is – what I do is start transaction, move into a manual transaction control, issue all those inserts, and then do a commit.

Now I’ve given out this tip out for a number of years now. A couple of years ago I had this guy come up to me after a seminar, and he said, “Rob, I heard you say this tip a year ago, and I was doing this operation that actually involved doing this operation which actually involved doing a 5 million row insert into a DB2 table.” And he said, “It would take us 8 hours to do the insert. I went back after hearing you give this one tip on switching out of auto commit mode, changed my code, took me five minutes to do, I reran it, and it took 10 minutes.”

Now that’s an extremely good example, but most people aren’t doing 5 million row inserts all the time. But it does show you the kind of difference that one little coding change can make. They were doing the 5 million row inserts, they went from taking 8 hours to taking 10 minutes just by implementing this one tip that we cover in the book. That’s the kind of difference that good code and bad code can make. Both of those things work, but 8 hours verses 10 minutes is definitely the success or failure of that product or that project.

Those are the kinds of things that we talk about. Again, I would encourage you to read the book, because we go into a lot of these things and explain why they’re true and give you very specific code examples as to what kind of things are good verses bad.



  • Published: May 19th, 2009
  • Comments: no responses

The Implications of Excellent Coding: Part 1 – Best Practices

Tags: , , , ,




In this two-part podcast Rob Steward explains the difference between good and bad coding, and how good coding can benefit your ODBC, JDBC, or .NET applications.

Part 1, which runs 7:05, focuses on coding best practices, while Part 2 provides real-world examples of the impact that good coding can make.

Click on the following link to listen to the podcast: http://dataaccesshandbook.com/media/RobSteward_GoodCoding_1.mp3

Rob Steward:

Great question, and there are large segments of the book devoted to answering that questions. Specifically, there are three chapters of the book, one for ODBC, one for JDBC, and one for ADO.NET that are devoted solely to good coding techniques. There are a lot of coding samples in there, there’s a lot of explanation of why one thing is good verses bad. I’ll give you a couple of examples of what we talk about in the book.

For example, with any of those APIs that I just mentioned, ADO.NET, ODBC or JDBC, there are multiple ways to execute a statement. For example, in ODBC, you could call a function called SQLExecDirect, or you can call SQLPrepare followed by SQL Execute. Now most people do not understand when is the write choice to use one or the other of those methods. There are advantages, particularly performance and scalability advantages, to picking one verses the other. In the book we talk about this, when do we choose one or the other, but I’ll walk through that one example here.

Prepare and Execute actually is very different than Exec Direct. I can execute the same statement either way. It’s going to work either way, but depending on what I want to do one is going to work much better than the other. So Prepare and Execute, two APIs that I have to call rather Exec Direct.

I should use Prepare and Execute anytime I have a parameterized SQL statement that I intend to execute more than once, and here’s why: Because when I call Prepare, what I actually do, I incur an extra round trip to the database server. So it’s an extra command that I send up there, I do the Prepare, it sends the SQL statement. What it tells the database server to do is generate an execution plan. Now an execution plan on a database server is really – when it goes through your SQL statement and you may do something like ‘select * from table where last name equals Steward.’ What an execution plan would look like for a statement like that is sort of a structure that says, okay, the tables involved in this SQL statement are whatever table names (employ table names), and the columns in the where clause are last names. And I know that there is an index built on last names, so I store that information to say okay, when I evaluate and I look for where last name equals Steward then I’m going to use X index. And it loads this thing in the memory and it holds onto it. So that when I actually go to execute that statement, it doesn’t have to parse my SQL anymore; it’s already done that, and it already knows which indexes to use, and which table to load into the memory, and which segments those indexes may need or may not need, and whether it needs to make temporary tables, and all kinds of things like that are in the execution plan. That takes some time to generate.

When you do a Exec Direct what that does, and the driver sends that across the Exec Direct, then the database will execute your SQL statement, evaluate it, generate the result set for it, and then toss out that execution plan. When I do a Prepare and Execute, once I’ve done the execute it generates the results, but it doesn’t toss out the execution plan. In an example like I just gave you, a pretty simple SQL statement, select * from table where last name equals Steward, even in that case the generation of the execution plan is actually going to take significantly longer than generating the results set.

So if I generate it once, that same statement, I’m actually better off because I saved the database roundtrip by using Exec Direct, because I don’t need the database to save that execution plan. I’m not going to execute the statement again. Now if I want to do that execution over and over, then I probably want to use Prepare and Execute. Because I do that initial prepare, the database spends all time generating the execution plan, but when it’s done executing my statement, it holds onto it. And it knows that when it sees that statement again, ‘I don’t need to go back through that logic and figure out the indexes and the tables and all that other stuff, I already know what it is.”

Now, I said at the beginning parameterize SQL. So instead of where ‘last name equals Steward,’ I would have, ‘where last name equals ?’. And then I would bind those parameters. I would bind that parameter value to some buffer in my application. In the case of JDBC or ADO.NET it would just be some string object that is bound to that ? or to that parameter within the SQL statement. Instead of literal value, so instead of using a literal value, so instead of saying where last name equals ‘Steward,’ I want to use parameter markers. Prepared statements in databases within their execution plans understand that there is no difference as far as what tables are involved and indexes, and the parsing of the SQL, and the data types and things involved, there’s no difference regardless of what that value is. So it can be Steward, it can be Smith, it can be Jones, it doesn’t matter, the execution of the SQL statement is the same. So if I’m going to execute it over and over, I want to put parameter markers in there, and I want to use Prepare and Execute. That’s just one tip. There are many of these things, and many tips covered within the book where we say, ‘here’s a choice that you can make, this choice is better for these situations and this other choice is better for these other situations.’



  • Published: April 21st, 2009
  • Comments: no responses

Why The Data Access Handbook is a Must-Have Resource for Architects, Programmers and DBAs

Tags: , , , , , , , , , , ,

In this podcast Rob Steward explains who The Data Access Handbook was written for, and what benefits they will get out of its content. The podcast runs for 3:00.

You can listen to the podcast by clicking on the following link: http://dataaccesshandbook.com/media/Rob7.mp3

From the Podcast:
Why would the development audience and the database architect be interested in The Data Access Handbook?

Rob Steward:

We wrote the book specifically targeting architects, programmers and DBAs. That may seem like a broad audience, but the reason that I can say that we targeted all three of them is that the book has different sections. The beginning of it we talked more towards the architect, but it’s also beneficial for the DBA or the programmer, in that we go through the concepts of what is middleware and why does it affect things. We talk about how it actually interacts with the network. How it talks to the database. What does that mean in your overall architecture? What are the architecture choices that you make, specifically concerning the middleware and how you access your data that have a great impact on your overall performances? And that’s kind of targeted again towards architects, but also to programmers.

And then we’ve got sections of the book – there’s three chapters where we go through very specific code examples using ODBC, JDBC and ADO.NET – and those three chapters are meant to be a reference for programmers to say, ‘If we take the concepts that we’ve gone over in the beginning of the book, how to we apply those within those standards?’ What does it mean when I say, ‘you need to make sure that you manage your transactions correctly?’ What does that actually mean when it comes to writing codes ODBC and JDBC? Therefore, very specific content for programmers, but, again, backup so the architects who first look at the problem or the applications, they understand on a conceptual level of how this thing needs to be designed. And then for the programmers, what are those specifics to code that you want to write to implement the architectures that we’ve talked about.

And then for DBAs – you know I talk to DBAs all the time – and DBAs are frustrated with the same thing; I’ve been told over and over by DBAs, ‘Well everybody always blames everything on the database; that it’s always in the database.’ And those guys, the DBAs, are probably the ones that know best that the problem’s not always in the database. Because they can tune and they can do these things, and it doesn’t make any difference. So they know it’s somewhere outside of the database, but again they’re not the programmers. But what this book does for DBAs is explain some of these things that the programmers are doing that has that big effect on the overall performance. So they can understand when those problems are on in the database, what kind of problems those could be. And what do they need to tell people to look for in their applications.

So I can say that we targeted this book at architects, programmers and DBAs, and I think there’s a lot in this book for all three of ‘em.


Book Content Copyright © 2009 by Prentice Hall PTR. All rights reserved. | Corporate Sponsor DataDirect Technologies.