• Published: May 26th, 2009
  • Comments: no responses

The Implications of Excellent Coding: Part 2 – Real world examples

Tags: , , , ,

In this two-part podcast Rob Steward explains the difference between good and bad coding, and how good coding can benefit your ODBC, JDBC, or .NET applications.

Part 2, which runs 7:05, provides real-world examples of the impact that good coding can make

Click on the following link to listen to the podcast: http://dataaccesshandbook.com/media/RobSteward_GoodCode_2.mp3

Rob Steward:

I’ll give you one other example that I like to use when I’m talking on performance, because it’s a pretty dramatic example. So I’ve been giving talks at conferences and seminars over the last 10 to 12 years now. And one of the tips that I’ve given out for a number of years now is controlling how you do transactions. So most people don’t realize that within all of those standards based APIs that I’ve talked about that you’re automatically in auto commit mode by default. What that means is, when I execute a statement, no matter how I execute it, it’s automatically committed. Now most databases don’t actually have an auto commit mode, but those standards based APIs require that by default, that’s the way you are. So I execute my insert into table, and it’s automatically committed.

Now, as apposed to manual commit mode, which is probably what most people think about when they think about transactions, where you begin a transaction, you do an insert or maybe multiple inserts or updates, and then actually issue the commit. All of these standards based APIs in order to make it easier on the programmers automatically commit everything for you. This is a good thing in most cases, but in some cases, that’s not what you want to happen. Obviously if you want to tie multiple inserts and updates within a single unit of work or within a single transaction, then you want to manually control it. You want to say start the transaction, issue the statement, and then issue the commit. And I think most people understand that.

But the thing that I’ve seen over and over is that you may want to do thousands of inserts or thousands of updates, but most people don’t understand the cost of auto commit mode is. So if we think about it, and think it through, what happens in the driver because of auto commit? If you’re in a database that doesn’t support an auto commit mode, which is most of them, most of them do not have a little switch that says ‘just automatically commit.’ Most of them require that what the driver does then in order to successfully do an auto commit is you execute a statement, what really happens underneath in the driver is it sends the statement across and it also sends a commit. So what that means is that for every statement that you’re executing, there’s actually two that are going across the wire. There’s your statement, and then there is a commit that goes behind it, even though you don’t see that. It’s required that the driver do that underneath you.

One of the things that a good driver or a good application will try to do is avoid network I/O. So now if we say we’re in auto commit and for every insert that we do we incur two network roundtrips. But not only that, but one of the most expensive things that we do on a database is commit. So what happens on a database when we do a commit, is its got all of your changes in memory and when you do the commit, when it responds with a success what it’s doing is guaranteeing that those changes will actually be persisted regardless of what happens next. So once you’ve got a success back from a commit or a rollback, the database is guaranteeing that no matter what happens, even if the power goes out or the machine goes down, that change is persistent. So what it has to do is, when you do the commit, is it has to flush some things out the disk. Not only does it flush them out the disk, but it tends to be non-sequential disk I/O.

So we all know that reading and writing from disk or to disk is one of the most expensive things that you can do, so we try to avoid that. When you do a commit you’re going to write to the disk. But not only that you’re going to write, typically because you’re updating the indexes and other things involved in a table, you’re going to write to very different places on the disk. So it’s non-sequential disk I/O.

The tip that we talk about in the book, and the thing that I wanted to point out on this podcast, is that if you don’t need to be in auto commit mode, then you can save significant amounts of time by switching into manual commit mode. And we go into the book exactly the code you write to do that. It’s also covered in all of the specifications for all of those APIs. But essentially, instead of going insert, insert, insert – you know, I’ve issued my four or five or 1,000 inserts or whatever it is – what I do is start transaction, move into a manual transaction control, issue all those inserts, and then do a commit.

Now I’ve given out this tip out for a number of years now. A couple of years ago I had this guy come up to me after a seminar, and he said, “Rob, I heard you say this tip a year ago, and I was doing this operation that actually involved doing this operation which actually involved doing a 5 million row insert into a DB2 table.” And he said, “It would take us 8 hours to do the insert. I went back after hearing you give this one tip on switching out of auto commit mode, changed my code, took me five minutes to do, I reran it, and it took 10 minutes.”

Now that’s an extremely good example, but most people aren’t doing 5 million row inserts all the time. But it does show you the kind of difference that one little coding change can make. They were doing the 5 million row inserts, they went from taking 8 hours to taking 10 minutes just by implementing this one tip that we cover in the book. That’s the kind of difference that good code and bad code can make. Both of those things work, but 8 hours verses 10 minutes is definitely the success or failure of that product or that project.

Those are the kinds of things that we talk about. Again, I would encourage you to read the book, because we go into a lot of these things and explain why they’re true and give you very specific code examples as to what kind of things are good verses bad.



  • Published: May 19th, 2009
  • Comments: no responses

The Implications of Excellent Coding: Part 1 – Best Practices

Tags: , , , ,




In this two-part podcast Rob Steward explains the difference between good and bad coding, and how good coding can benefit your ODBC, JDBC, or .NET applications.

Part 1, which runs 7:05, focuses on coding best practices, while Part 2 provides real-world examples of the impact that good coding can make.

Click on the following link to listen to the podcast: http://dataaccesshandbook.com/media/RobSteward_GoodCoding_1.mp3

Rob Steward:

Great question, and there are large segments of the book devoted to answering that questions. Specifically, there are three chapters of the book, one for ODBC, one for JDBC, and one for ADO.NET that are devoted solely to good coding techniques. There are a lot of coding samples in there, there’s a lot of explanation of why one thing is good verses bad. I’ll give you a couple of examples of what we talk about in the book.

For example, with any of those APIs that I just mentioned, ADO.NET, ODBC or JDBC, there are multiple ways to execute a statement. For example, in ODBC, you could call a function called SQLExecDirect, or you can call SQLPrepare followed by SQL Execute. Now most people do not understand when is the write choice to use one or the other of those methods. There are advantages, particularly performance and scalability advantages, to picking one verses the other. In the book we talk about this, when do we choose one or the other, but I’ll walk through that one example here.

Prepare and Execute actually is very different than Exec Direct. I can execute the same statement either way. It’s going to work either way, but depending on what I want to do one is going to work much better than the other. So Prepare and Execute, two APIs that I have to call rather Exec Direct.

I should use Prepare and Execute anytime I have a parameterized SQL statement that I intend to execute more than once, and here’s why: Because when I call Prepare, what I actually do, I incur an extra round trip to the database server. So it’s an extra command that I send up there, I do the Prepare, it sends the SQL statement. What it tells the database server to do is generate an execution plan. Now an execution plan on a database server is really – when it goes through your SQL statement and you may do something like ‘select * from table where last name equals Steward.’ What an execution plan would look like for a statement like that is sort of a structure that says, okay, the tables involved in this SQL statement are whatever table names (employ table names), and the columns in the where clause are last names. And I know that there is an index built on last names, so I store that information to say okay, when I evaluate and I look for where last name equals Steward then I’m going to use X index. And it loads this thing in the memory and it holds onto it. So that when I actually go to execute that statement, it doesn’t have to parse my SQL anymore; it’s already done that, and it already knows which indexes to use, and which table to load into the memory, and which segments those indexes may need or may not need, and whether it needs to make temporary tables, and all kinds of things like that are in the execution plan. That takes some time to generate.

When you do a Exec Direct what that does, and the driver sends that across the Exec Direct, then the database will execute your SQL statement, evaluate it, generate the result set for it, and then toss out that execution plan. When I do a Prepare and Execute, once I’ve done the execute it generates the results, but it doesn’t toss out the execution plan. In an example like I just gave you, a pretty simple SQL statement, select * from table where last name equals Steward, even in that case the generation of the execution plan is actually going to take significantly longer than generating the results set.

So if I generate it once, that same statement, I’m actually better off because I saved the database roundtrip by using Exec Direct, because I don’t need the database to save that execution plan. I’m not going to execute the statement again. Now if I want to do that execution over and over, then I probably want to use Prepare and Execute. Because I do that initial prepare, the database spends all time generating the execution plan, but when it’s done executing my statement, it holds onto it. And it knows that when it sees that statement again, ‘I don’t need to go back through that logic and figure out the indexes and the tables and all that other stuff, I already know what it is.”

Now, I said at the beginning parameterize SQL. So instead of where ‘last name equals Steward,’ I would have, ‘where last name equals ?’. And then I would bind those parameters. I would bind that parameter value to some buffer in my application. In the case of JDBC or ADO.NET it would just be some string object that is bound to that ? or to that parameter within the SQL statement. Instead of literal value, so instead of using a literal value, so instead of saying where last name equals ‘Steward,’ I want to use parameter markers. Prepared statements in databases within their execution plans understand that there is no difference as far as what tables are involved and indexes, and the parsing of the SQL, and the data types and things involved, there’s no difference regardless of what that value is. So it can be Steward, it can be Smith, it can be Jones, it doesn’t matter, the execution of the SQL statement is the same. So if I’m going to execute it over and over, I want to put parameter markers in there, and I want to use Prepare and Execute. That’s just one tip. There are many of these things, and many tips covered within the book where we say, ‘here’s a choice that you can make, this choice is better for these situations and this other choice is better for these other situations.’



  • Published: May 5th, 2009
  • Comments: no responses

Criteria for a Highly Effective Database Driver: Part 1 ODBC

Tags: , , , , , , , , ,

In this two-part podcast Rob Steward discusses what to look for in a highly effective database driver. Part 1, which runs 3:49, focuses on ODBC, while Part 2 will concentrate on JDBC.

Click on the following link to listen to the podcast: http://dataaccesshandbook.com/media/Rob9.mp3

Rob Steward:

Well of course I’ve spent my career developing, producing database drivers of all sorts: ODBC, JDBC, LAB and ADO.NET. And I’ve spent my career looking at different implementation, and looking at the way these things are written and how they interact with the databases. And I can tell you unequivocally that they make a huge difference in terms of your overall performance, reliability, and scalability of your data access applications.

There are several things that you need to look at when you look at those drivers from a cursory glance – or from a quick look – at a driver, what separates one from the other? The first thing I’d say is the architecture of that driver. ODBC was the first real standards based API that came out for which companies built drivers. That driver, what it does eventually is translated from that common API into some underlying thing that talks to a database or a data source. Data drivers do more than just that translation, but I’ll concentrate on that.

What it translates to underneath, when ODBC was originally released – or if you look at say JDBC when it originally came out and there were really just the beginning of JDBC drivers as well – the way they were written was you had this driver piece that sat on top of some other client API from the database vendors themselves. So let’s say in the case of Oracle you may have an ODBC driver, which is this layer that does this translation – it handles the error mappings, and it handles characters and translations, and all kinds of other things that it does. But it sits on top of what’s called OCI from Oracle – the client interface that Oracle has – and then that actually talks across the network to the database server, to the Oracle server. Same thing is true with the DB2, or with SQL server, or Sybase or Informix or any of the other ones we’re familiar with.

One big difference there are with drivers is whether that driver can talk directly to the database across the network, or whether it needs to sit on top of some other layer. What we call that in ODBC terms, we’ll call that a wire protocol driver. So if a driver doesn’t need any of those underlying client pieces from the database vendor, and it can directly open up the TCP/IP Docket or same pipe or something to the database, then we call that a wire protocol driver. It is a standalone piece that you can put with your application that can talk directly to the database. It doesn’t need some instillation of some other piece, which can cause you all kinds of versioning issues. Particularly we see it with Oracle where people say, ‘well I need a particular version of the ODBC driver, but I also need some particular version of this client piece, but I have another application that sits on the same machine that needs a different version of that client piece or a different version of the ODBC driver. And what you end up with is a big mess where it’s difficult in a Windows environment to use all of those things on the same machine.


  • Published: April 21st, 2009
  • Comments: no responses

Why The Data Access Handbook is a Must-Have Resource for Architects, Programmers and DBAs

Tags: , , , , , , , , , , ,

In this podcast Rob Steward explains who The Data Access Handbook was written for, and what benefits they will get out of its content. The podcast runs for 3:00.

You can listen to the podcast by clicking on the following link: http://dataaccesshandbook.com/media/Rob7.mp3

From the Podcast:
Why would the development audience and the database architect be interested in The Data Access Handbook?

Rob Steward:

We wrote the book specifically targeting architects, programmers and DBAs. That may seem like a broad audience, but the reason that I can say that we targeted all three of them is that the book has different sections. The beginning of it we talked more towards the architect, but it’s also beneficial for the DBA or the programmer, in that we go through the concepts of what is middleware and why does it affect things. We talk about how it actually interacts with the network. How it talks to the database. What does that mean in your overall architecture? What are the architecture choices that you make, specifically concerning the middleware and how you access your data that have a great impact on your overall performances? And that’s kind of targeted again towards architects, but also to programmers.

And then we’ve got sections of the book – there’s three chapters where we go through very specific code examples using ODBC, JDBC and ADO.NET – and those three chapters are meant to be a reference for programmers to say, ‘If we take the concepts that we’ve gone over in the beginning of the book, how to we apply those within those standards?’ What does it mean when I say, ‘you need to make sure that you manage your transactions correctly?’ What does that actually mean when it comes to writing codes ODBC and JDBC? Therefore, very specific content for programmers, but, again, backup so the architects who first look at the problem or the applications, they understand on a conceptual level of how this thing needs to be designed. And then for the programmers, what are those specifics to code that you want to write to implement the architectures that we’ve talked about.

And then for DBAs – you know I talk to DBAs all the time – and DBAs are frustrated with the same thing; I’ve been told over and over by DBAs, ‘Well everybody always blames everything on the database; that it’s always in the database.’ And those guys, the DBAs, are probably the ones that know best that the problem’s not always in the database. Because they can tune and they can do these things, and it doesn’t make any difference. So they know it’s somewhere outside of the database, but again they’re not the programmers. But what this book does for DBAs is explain some of these things that the programmers are doing that has that big effect on the overall performance. So they can understand when those problems are on in the database, what kind of problems those could be. And what do they need to tell people to look for in their applications.

So I can say that we targeted this book at architects, programmers and DBAs, and I think there’s a lot in this book for all three of ‘em.


  • Published: April 14th, 2009
  • Comments: no responses

Don’t Overlook the Importance of Middleware to Database Application Performance

Tags: , , , , , , , , , , ,

In this podcast Rob Steward explains why architects, designers, programmers, or DBAs have overlooked the middleware to improve database performance, and why he wrote The Data Access Handbook to help educate the community on the middleware’s importance. The podcast runs for 2:42.

To listen to the podcast, please click on the following link: http://dataaccesshandbook.com/media/Rob6.mp3

Podcast text:

Why was there such latency in focusing on middleware for these performance and connectivity issues?

Rob Steward:

Well I think that the reason many people overlook middleware network as a performance issues is because you can walk into Barns & Noble, or any Boarders, or you can go on Amazon, and you’ll see book after book after book written on how to tune those databases. And you can go to lots and lots of sessions and seminars, and entire conferences built on how do I tune my Oracle database? Or how do I tune my SQL server database? So what you hear as an architect, designer, programmer, or as a DBA is that all the problems occur on the database. And so that’s what the focus has been in our industry. In fact, there is a whole industry built around tuning databases. We all know somebody, who that’s their job. They may be a consultant that gets hired for a couple of months and look over somebody’s database configuration and tuning, and figure that out. Granted, that’s a very important thing. You do need to tune your database.

What’s not been out there is the general knowledge of what the impact of middleware can actually be. Now I sell software. I work for a software company that makes database connectivity, which is part of why I know all these things about the middleware, but I see it over and over. Now I’ve sold a lot of software because that middleware is actually the problem, and not the database.

And again I think the reason why that people haven’t realized it – or as much as they should – is because everything you read, and if you listen to the experts, they’re going to tell you all your problems on your database. Then if they can’t solve it by tuning, or doing the things that they do, they’re going to say you need better hardware. Well I’ve seen hundreds and thousands of times, literally, that that’s not the answer. The actually problem is in that middleware layer.

So I think people tend to learn, but they learn it the hard way. They’ve learned that they’re application doesn’t perform well enough. They’ve done what all the experts say. They may have spent millions of dollars hiring people to come in to try to tune their environment, and it still doesn’t work. And the reason it doesn’t work, again, is because that’s not where their problem was, or it’s where a small part of their problem was.

I guess I would sum it up by saying it is education. And again, the reason I wrote a book on the subject is because there’s just not any information out there, and there’s not people talking about what kind of impact middleware can have.

  • Published: March 31st, 2009
  • Comments: no responses

Best Practices to Significantly Improve Database Application Performance and Scalability

Tags: , , , , , , ,

In this podcast, Rob Steward explains why readers will find The Data Access Handbook useful. The podcast runs for 1:38.

To listen to the podcast, please click on the following link: http://dataaccesshandbook.com/media/Rob4.mp3

Podcast text:

What will someone get out of The Data Access Handbook?

Rob Steward:

What you would get from The Data Access Handbook is a whole lot of knowledge, as well as tips and tricks, on how to make your database application performance scale much better. The book is full of explanations about why middleware and the network impact your performance. And then there are a number of chapters that we go into in significant depth on how these things affect your performance.

We also have a number of chapters where we talk specifically about coding with standards based APIs. So if you’re using ODBC, or you’re using JDBC, or you’re using ADO.NET, what are the specific things that you can do within those data access APIs to make your performance significantly better?

Also, how do you tune those things? We talk about tuning databases, configuration options, and indexes, and all those kinds of things. But middleware typically has a lot of tuning options as well. That can have just as big of impact on your overall performance as those tuning options on your server.

We go through the concepts in the book, and how they overall affect your application’s performance. But we also go into detail to tell you exactly how you manipulate those things to make them work to make your database application work faster.


  • Published: March 24th, 2009
  • Comments: no responses

Why Write The Data Access Handbook? The Authors Explain

Tags: ,

In this podcast, Rob Steward explains why he and Coauthor, John Goodson, wrote The Data Access Handbook. The podcast runs for 4:58.

To listen to the podcast, please click on the following link: http://dataaccesshandbook.com/media/Rob3.mp3

Podcast text:

Why did you write The Data Access Handbook?

Rob Steward:

We wrote the book because we have been told that the world is flat. What I mean by that, by the fourth century Greek philosophers had started to offer compelling evidence that the world was in fact round. But it still took several hundred more years before anybody started to believe them. I see the same thing happens in the database application performance area. What we’re told, and what all the experts tell you, is that if you rollout your application, and you’ve got performance and scalability problems, that all the problems exist in the database. What they’ll tell you is: ‘You need to tune your database;’ ‘you need to do the right configuration options;’ ‘you need to create the right schemas, with the right tables, and the right relationships, and the right indexes.’ And that will solve all of your problems. Essentially what they’re telling you is that the world is flat, because that’s not all that there is to it.
Even once you’ve done that and you’ve tuned your database, often times we find that the performance problems are not solved. And the reason for that is because in today’s environment we see 75-95% of the time accessing your data is actually in the middleware or on the network.

I’ve seen, literally hundreds or thousands of times in my career, where people are doing exactly what the experts say: Let’s tune those things; let’s get ‘em right. And it still doesn’t resolve the problem, or it only solves a very small part of their problem. I’ve spent the last ten year, a large percentage of my time, going around talking to people at conferences like JavaOne, TechEd and many other conferences, and talking about what are the performance implications of the middleware. I’ve written magazine articles, written whitepapers; my coauthor, John, has been doing the same thing for 10 or 15 years now. Between the two of us we have a combined 35 years of building database middleware. So we’ve built up a lot of knowledge, we’ve spent a lot of time sharing that knowledge, and we’ve helped a lot of people.

I like to tell this particular story, because I think it shows you the kind of impact I’m talking about. A couple of years ago I was giving a talk on performance, and there is a particular tip that I give out – which I’ve been giving out for a number of years and it’s also in the book – the particular tip about how to change one little thing and it explains how that makes a difference, why it reduces lots of network round trips, and the amount of I/O that you cause to happen on the database server. So I was giving the talk and a guy came up to me and he said, ‘Rob, I heard you give this particular tip about a year ago at another place you were speaking. I went back to work and implemented that particular tip. We were doing an operation where we were inserting 5 million rows into a database, and it was taking 8 hours to do. So we made this one change that you suggested, and all of a sudden it took ten minutes instead of 8 hours.’ Now that is an extreme example, but it is a real-world example that I like to use to explain what it is we are talking about. That one little change – they didn’t change their database, they didn’t tune it, they didn’t make any changes on that side – all they did was make a couple of lines of code change, and that’s the kind of benefit that they saw: an operation that took 8 hours now took 10 minutes.

Now the reason that they didn’t know about that before is because you can’t find information like you can find in this book. You can walk into any bookstore and you’ll find book after book after book on how you tune those database servers. But what you won’t find, at least until now, is a single book that says well, how do I write the best data access code? And how I tune that middleware? And how does it actually work? And how does it influence things? There has never been a book like this before. It’s covering a new ground that the experts you are used to don’t talk about.

Two years ago we finally decided that we need to take all of this information that we’ve been sharing with people for years – through articles, white papers and at conferences – and put it into a form that we can share all of this with a lot more people. That’s ultimately why we wrote the book.


Book Content Copyright © 2009 by Prentice Hall PTR. All rights reserved. | Corporate Sponsor DataDirect Technologies.