Add option to configure precision on the ETL Output step

Edgar Kautzner shared this idea 4 years ago
Idea Logged

Hi

I'm troubleshooting an issue with a (fairly simple) data transformation and hoping to get some assistance.

There are two steps in the data flow:

- Extract from query (this extracts four columns from a DB)

- Load to db (this loads the output to a table)

When I run the transformation, the input step succeeds, however the output fails with the error java.lang.Throwable: Data truncation.

The only thing a little out of the ordinary with the data returned by the query is that one of the columns has a high degree of precision. It is of data type DECIMAL(30,20).


I tried to resolve the issue by changing the precision on the input step to 20, however this did not help.

I couldn't find anything useful in the data source or jdbc logs, however the following error was present in yellowfin.log file:


YF:2019-08-02 10:36:29:ERROR (ETLOutputToDatabaseStep:endRows) - Error: java.sql.BatchUpdateException: Data truncation

java.sql.BatchUpdateException: Data truncation

at net.sourceforge.jtds.jdbc.JtdsStatement.executeBatch(JtdsStatement.java:1077)

at com.hof.mi.process.CSVViewProcess.insertRows(CSVViewProcess.java:1273)

at com.hof.mi.etl.step.definition.ETLOutputToDatabaseStep.bulkRowInsert(ETLOutputToDatabaseStep.java:1309)

at com.hof.mi.etl.step.definition.ETLOutputToDatabaseStep.preEndRows(ETLOutputToDatabaseStep.java:1379)

at com.hof.mi.etl.step.AbstractETLRowStep.endRows(AbstractETLRowStep.java:233)

at com.hof.mi.etl.step.AbstractETLRowStep.endRows(AbstractETLRowStep.java:245)

at com.hof.mi.etl.step.AbstractETLCachedStep.D(AbstractETLCachedStep.java:189)

at com.hof.mi.etl.step.AbstractETLCachedStep.endRows(AbstractETLCachedStep.java:125)

at com.hof.mi.etl.runner.ExecutionHead.call(ExecutionHead.java:29)

at com.hof.mi.etl.runner.ETLRunnerImpl.run(ETLRunnerImpl.java:170)

at com.hof.mi.util.background.ETLQueueItem.process(ETLQueueItem.java:154)

at com.hof.mi.util.background.ETLQueueItem.process(ETLQueueItem.java:42)

at com.hof.mi.util.background.ExecutionQueue$TaskRunner.call(ExecutionQueue.java:473)

at java.util.concurrent.FutureTask.run(Unknown Source)

at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

at java.lang.Thread.run(Unknown Source)

YF:2019-08-02 10:36:29:ERROR (AbstractETLRowStep:D) - Error

com.hof.mi.etl.ETLException: java.sql.BatchUpdateException: Data truncation

at com.hof.mi.etl.ETLException.getETLException(ETLException.java:45)

at com.hof.mi.etl.ETLException.throwETLException(ETLException.java:38)

at com.hof.mi.etl.step.AbstractETLStep.throwUnhandledETLException(AbstractETLStep.java:668)

at com.hof.mi.etl.step.definition.ETLOutputToDatabaseStep.preEndRows(ETLOutputToDatabaseStep.java:1390)

at com.hof.mi.etl.step.AbstractETLRowStep.endRows(AbstractETLRowStep.java:233)

at com.hof.mi.etl.step.AbstractETLRowStep.endRows(AbstractETLRowStep.java:245)

at com.hof.mi.etl.step.AbstractETLCachedStep.D(AbstractETLCachedStep.java:189)

at com.hof.mi.etl.step.AbstractETLCachedStep.endRows(AbstractETLCachedStep.java:125)

at com.hof.mi.etl.runner.ExecutionHead.call(ExecutionHead.java:29)

at com.hof.mi.etl.runner.ETLRunnerImpl.run(ETLRunnerImpl.java:170)

at com.hof.mi.util.background.ETLQueueItem.process(ETLQueueItem.java:154)

at com.hof.mi.util.background.ETLQueueItem.process(ETLQueueItem.java:42)

at com.hof.mi.util.background.ExecutionQueue$TaskRunner.call(ExecutionQueue.java:473)

at java.util.concurrent.FutureTask.run(Unknown Source)

at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

at java.lang.Thread.run(Unknown Source)


Cheers,

Edgar

Replies (10)

photo
1

Hi Edgar,

Thanks for reaching out. To fully test this, I'll need to know what RDBMS you're testing this with. Unfortunately, the stack trace doesn't provide indication of this.

I went ahead and did this with MySQL anyways to see what happened. If I understand correctly, you're pulling data from four columns, I'm assuming in one of these columns you have a numeric value with either more than 30 places in the whole number and/or 20 decimal places? And you're seeing this error in attempting to then output these columns into a database where one of the columns is set to DECIMAL(30,20). Is this correct?

If so, I think this is to be expected, at least with MySQL. If I simply try to draft an INSERT statement (which is of course ultimately what's happening in the transformation when outputting to a table) in MySQL that contains a number with more than 20 decimal places, the value is automatically truncated.

Here's my example:

INSERT statement featuring less than 20 decimals:

/AQrjNVQXeJ2sAAAAAElFTkSuQmCC

Works without error.

INSERT statement with a value featuring more than 20 decimals:

/Hk4Mnvu6CIIgCIIgCIJwposQRRRRRBFFFFFEEUUUUUQRRRRRRBFFFFFEEUUUUUQRRRRRznn5P3RmG1l6+ewSAAAAAElFTkSuQmCC

I get a warning and if I select this column from my table, you can see all the values I've inserted have 20 decimal places:

I guess ultimately though, the values should still be inserted even if there's some sort of warning being thrown by the database, so please let me know if my understanding above is correct so I can create a column with a number with more than 20 decimal points then try writing it to a table featuring a column set to DECIMAL(30,20) via a Transformation Flow.


Thanks,

Mike

photo
1

Hi Edgar,

I just wanted to check in and see how things are going with this.

Regards,

Mike

photo
1

Hi Mike,

Sorry for the delay in responding. Firstly, the system is using MS SQL Server.

One piece of information I didn't add here was that the truncation is dropping the table at each refresh.

I did some more digging here and noticed that the output of the stored procedure has a data type of DECIMAL(30,20), however the field was being created in Yellowfin as Decimal(38,30). This was causing an overflow on the left hand side of the number as it only allowed 8 numbers, not 10.

I've worked around this by changing the Data Transformation to be a truncate (not drop) and manually adjusting the precision of the target table.

Thanks for your help.

Cheers,

Edgar

photo
1

Hi Edgar,

Thanks for confirming and explaining your solution here. Please let us know if you'd like us to continue troubleshooting the cause of this and if so, can you please provide some screenshots of your setup so we can attempt replication and log this? Basically, I'd just require some screenshots of your Transformation Flow setup as well as receive some information on what the Stored Procedure itself is doing.


Thanks,

Mike

photo
1

Hi Mike,

I believe there is a bug here so it would probably be a good idea to look into it because others are likely to run into it. Personally I'm going to settle for the workaround to avoid spending more time on it.

The DT setup is very simple and contains two steps.

/HzboU6o1ho+JAAAAAElFTkSuQmCC

The first step just executes the stored procedure (EXECUTE dbo.xxxx). This returns a 4 columns of data, one with the high-precision metric.

The second DT just outputs to a DB table. It was originally set to a "drop existing", so I would expect that Yellowfin would detect the precision (from step 1 which ran successfully) and create a table with appropriate precision to load the data into.

Cheers,

Edgar

photo
1

Hi Edgar,

Thanks for reaching out. I've been able to replicate this. Not with the same exact changes, likely due to differences in setup, but the end result is the same.

After executing a Stored Procedure and calling fields with the following values:

/1iY4TRhwoQJ+P+fIK+yal6OuAAAAABJRU5ErkJggg==

And selecting Drop Existing:

They were transformed to:

/H+nJskwF9bGCAAAAABJRU5ErkJggg==


I've gone ahead and logged a defect for this. Any potential updates regarding this will be posted here.

Regards,

Mike

photo
1

Hi Edgar,

I've received feedback from the dev team on this, and in short, the output step calculates precision based on the data that flows into it. However, here are some further details: "The ETL output step calculates number precision with all results. I think the problem could happen when the store procedure expects the data type of DECIMAL(30,20) for the numeric column. However the results of calculations fall into a smaller range such as DECIMAL(28, 20). In this case, ETL output step creates a new table with calculated precision. This would cause potential data overflow. It seems like more of an enhancement that allows user to set up precision of the output column. As it can be worked around by taking UPDATE and TRUNCATE options and I feel it makes sense to calculate precision for new table."

Since the TRUNCATE option does get us around this issue, is there a reason why this couldn't be used instead?

Regards,

Mike

photo
1

Hi Edgar,

I just wanted to check in and see if you had any comments on feedback provided in latest response.

Regards,

Mike

photo
1

Hi Mike,

Thanks for getting back to me - this issue was from such a long time ago, I'm not sure how I worked around it - but it is all good now.

I like the idea of being able to set the precision on the output step.

What wasn't working for me was the following:

  1. A stored procedure returning a table of data with a given precision, and
  2. An output step (using drop) to drop the table and recreate it every time the DT runs.

It seems that the output step didn't correctly determine the precision of the required table based on the data returned from the stored proc.

Anyway, feel free to close this item and/or log a task for the dev backlog.

Thanks!

Edgar

photo
1

Hi Edgar,

Since it was a dev suggestion that you are in favor as well, I'm happy to change this from a Problem item to an Idea and submit an enhancement request to Add option to configure precision on the ETL Output step. I've gone ahead and done so and will keep you posted with any potential updates regarding this here.

Regards,

Mike

Leave a Comment
 
Attach a file