Tuesday, 25 July 2017

Enhance Your MySQL XML Import Procedures using Prepared Statements

In my Importing XML Data into MySQL Tables Using a Stored Procedure article, I outlined some ways to work around MySQL's restrictions on stored procedures to import XML data into your MySQL database tables. That article covered the basics of formatting your XML files as well as writing your proc to process the data. In today's article, we'll be enhancing our proc to use a Prepared Statement, include error handling and validation, as well as handling additional XML formats.

Using a Prepared Statement

Added way back in version 4.1, prepared statements are old hat to programming languages like C++ and Java. Prepared Statements are SQL statements that are compiled using the client/server binary protocol specification. MySQL provides its own SQL interface for prepared statements that you can access directly within your stored procedures. While not as fast as using the binary protocol through a prepared statement API, it can be utilized directly at the SQL level.

The following code, which appears in the import_applicants_xml stored procedure described in the previous article, is responsible for inserting the XML values into the applicants table:

insert into applicants values (
 extractValue(xml_content, concat(v_xpath_row, '[1]')),
 extractValue(xml_content, concat(v_xpath_row, '[2]')),
 extractValue(xml_content, concat(v_xpath_row, '[3]'))

Contrast the above with the following prepared statement, following the typical SQL syntax for prepared statements, whereby:
  1. PREPARE prepares a statement for execution.
  2. EXECUTE executes the prepared statement.
  3. DEALLOCATE PREPARE releases the prepared statement.
SET @sql_stmt = 'insert into applicants values (?, ?, ?)';
 PREPARE stmt FROM @sql_stmt;
SET @a = extractValue(xml_content, concat(v_xpath_row, '[1]'));
 SET @b = extractValue(xml_content, concat(v_xpath_row, '[2]'));
 SET @c = extractValue(xml_content, concat(v_xpath_row, '[3]'));
EXECUTE stmt USING @a, @b, @c;

As any developer knows, error handling is a must when executing any process that alters data. An import is no exception. When an error occurs inside a stored procedure, it's crucial that you do something with it. Your two main choices are to either continue processing or exit the proc. In either case, you should issue a meaningful error message that acknowledges that an error occurred. Sometimes your error handling can be as simple as an if test. In the following example, the number of XML node attributes is compared to the number of columns in the target table so that an error may be displayed and processing circumvented if they don't match:

set v_attribute_count = extractValue(xml_content,concat('count(', v_xpath_row, ')'));
if v_attribute_count != column_count then
 select concat('Attribute count doesn\'t match the number of columns at record ',

For more thorough error handling, use the DECLARE HANDLER statement:

DECLARE action HANDLER FOR condition_value statement;
When an error condition occurs whose value matches the condition_value, MySQL executes the associated statement and either continues or exits the current code block (BEGIN … END) based on the action. Hence, the action may apply to a nested code block or to the entire proc.

The action is limited to one of the following two values:
  • CONTINUE: the execution of the enclosing code block continues along its merry way.
  • EXIT: the execution of the enclosing code block, where the handler is declared, terminates.
The condition_value may be any of the following values:
  • A numeric MySQL error code.
  • An SQLSTATE value, including the SQLWARNING, NOTFOUND or SQLEXCEPTION named constants.
  • A named condition associated with either a MySQL error code or SQLSTATE value using the DECLARE ... CONDITION statement.
One of the most common issues when inserting new records is error code 1062, which represents a Duplicate Key. Therefore, that one should be singled out. It's also a good idea to include some kind of generic handlers for SQLEXCEPTION and SQLWARNING conditions.

DECLARE CONTINUE HANDLER FOR 1062 SELECT concat('Duplicate key at record ', v_row_index);
 DECLARE EXIT HANDLER FOR SQLEXCEPTION SELECT concat('SQLException encountered at record ', v_row_index);
 DECLARE CONTINUE HANDLER FOR SQLWARNING SELECT concat('SQLWarning encountered at record ', v_row_index);

The action part of the statement may be placed within a BEGIN…END block to include more than one line. That allows us to rollback the transaction in the event of an error.

 SELECT concat('Duplicate key at record ', v_row_index, '. Aborting.');
-- later on…
 SET @sql_stmt = 'insert into applicants values (?, ?, ?)'
-- more code…
EXECUTE stmt USING @a, @b, @c;

The above handler would produce the following message if we were to accidentally run the import twice on the same file:

Duplicate key at record 1. Aborting.

Accepting Other Data Formats

XML's flexibility in node structuring is a double-edged sword to us developers who have to be prepared to deal with numerous formats. Since MySQL does not provide any means of mapping data elements to the target table(s), we must rely on XPath selectors to extract the data.

In the last article we worked against the following format where fields were represented by node attributes:

<?xml version="1.0"?>
 <applicant id="1" fname="Rob" lname="Gravelle"/>
 <applicant id="2" fname="Al" lname="Bundy"/>
 <applicant id="3" fname="Little" lname="Richard"/>

To give you some idea as to what degree a different XML format can make on your proc, consider the following one where the first and last name fields are contained between tags rather than in attributes:

<?xml version="1.0"?>
<applicant id="1"><fname>Rob</fname><lname>Gravelle</lname></applicant>
<applicant id="2"><fname>Al</fname><lname>Bundy</lname></applicant>
<applicant id="3"><fname>Little</fname><lname>Richard</lname></applicant>

Besides changes to the XPath, the import proc below contains a few other modifications worth noting.
  1. The ID is extracted from the XML rather than being set to the loop index.
  2. The validation now compares the number of nodes against the number of columns minus one (the ID).
  3. The number of successful inserts and failed ones are tracked and displayed at the end.
Here is the proc in its entirety:

CREATE DEFINER=`root`@`localhost` PROCEDURE `import_xml`(path varchar(255), node varchar(255))
declare xml_content text;
declare v_row_index int unsigned default 0;
declare v_row_count int unsigned;
declare v_node_count int unsigned;
declare v_xpath_row varchar(255);
declare v_column_count int;
declare v_id int;
declare v_inserted_rows int default 0;
declare v_error_rows int default 0;
SELECT concat('Duplicate key at record ', v_row_index, '. Aborting.');
DECLARE CONTINUE HANDLER FOR SQLEXCEPTION SELECT concat('SQLException encountered at record ', v_row_index);
DECLARE CONTINUE HANDLER FOR SQLWARNING SELECT concat('SQLWarning encountered at record ', v_row_index);
set xml_content = load_file(path);
-- calculate the number of row elements.
set v_row_count = extractValue(xml_content,concat('count(',node, ')'));
set v_column_count = (SELECT count(*) FROM information_schema.columns WHERE table_name = 'applicants');
-- select extractValue(xml_content,'count(/applicant_list/applicant[1]/@*)');
-- loop through all the row elements
while v_row_index < v_row_count do
set v_row_index = v_row_index + 1;
set v_xpath_row = concat(node, '[', v_row_index, ']');
-- calculate the number of nodes.
set v_node_count = extractValue(xml_content,concat('count(', v_xpath_row, '/*)'));
if v_node_count != v_column_count - 1 then
-- insert into temp_xml_warnings values(v_row_index, concat(v_xpath_row, '/@*'));
select concat('Node count doesn\'t match the number of columns at record ', v_row_index);
set v_id = convert(extractValue(xml_content, concat(v_xpath_row, '/@id')), UNSIGNED INT);
if v_id = 0 then -- invalid!
select concat('ID is invalid at record ', v_row_index);
set v_error_rows = v_error_rows + 1;
SET @sql_stmt = 'insert into applicants values (?, ?, ?)';
PREPARE stmt FROM @sql_stmt;
SET @id = v_id;
SET @ln = extractValue(xml_content, concat(v_xpath_row, '/lname'));
SET @fn = extractValue(xml_content, concat(v_xpath_row, '/fname'));
EXECUTE stmt USING @id, @ln, @fn;
set v_inserted_rows = v_inserted_rows + 1;
end if;
end if;
end while;
select concat('Inserted ', v_inserted_rows, ' rows. Failed to insert ', v_error_rows, ' rows.');

Calling the above proc would result in a message similar to the following after a successful run:

Inserted 3 rows. Failed to insert 0 rows.