Limitation on output rows in Trasformation Flow

Timofey shared this question 5 months ago
Answered

Hi,

I've try to use transformation flow to join data from different datasources.

I have some problem with output table. There are only 400 rows in the output source table.

I've tryed severel times, with different input data that include more that 10000 rows , tryed input step, trasformation step and output step, tryed only input step and output step - there always only 400 rows in View, that build on output table.

Can I change this limitation?

Comments (9)

photo
1

Hi Timofey,

I've just done a quick test in my Yellowfin and I was successfully able to output a table of 26,000.

There is a Row Limit set in the Yellowfin Data Source as follows:

/P3QxReaChcUFwAoMvevHnz3Xff1Y0Lh4eHteJCs+9d+P87jfog6TUSLQAAAABJRU5ErkJgggA=

so that would definitely be worth checking.

And just in case that isn't the reason for your 400 row output, could you please show a screenshot of the Transformation Summary such as the one below:

/omRpC3Nmcua+kt6TmYWPnPjMdo4Mxes7kunZ+02sSYOOSdguGT6fDyMZTgb9EQAeAwe1Z5BJxv8Hsv4ENAZ2P7kAAAAASUVORK5CYIIA


and also please zip up and send across all your log files.

thanks,

David

photo
1

Hi Dave,

I'm glad that you answered.

The Max Row Limit was turned off.

That is screen from trasformation flow edit window:

/D1bdNb+7L7L4AAAAAElFTkSuQmCC

That is screen from Transformation summary

/P8NssZWUlUPyAAAAABJRU5ErkJggg==

After that, I've edited my Transformation Flow - turned on Enable Scheduling, run preview. It was the same 'no logs' in transformation summary. Then I've press Run now button in schedule managment and now it's OK! I see my 20k rows!

/05UAAAAASUVORK5CYII=

photo
1

Hi Timofey,

that's great news, I'm glad it's showing the 20k rows now.

Sounds you like maybe you were only doing the preview previously, not the full proper run.

Thanks for sharing the good news!

regards,

David

photo
1

Hi Dave,

I faced up with another problem:

I've tryed to merge data from two datasources. As you can see on screenshot there are 6 million rows in the one datasource and 20k in the another. When I tryed to run this Transformation Flow at first time, the output stage take 40 mins and after that YellowFin stopped responding and we had to restert it.

/Zxp4P88AAAAASUVORK5CYII=

Later, when Transfirmation Flow run by schedule it was canselled status.

/iNzdu3Lh58+ZvfjH6fwAH4SgPwtKH8QAAAABJRU5ErkJggg==

In attach you can find our logs.

And the question is why the output stage take so much time and how I can speed up it?

photo
1

Hi Timofey,

yes, inserting large amounts of records can be very slow, if you research this on the internet you will find lots of articles and forum posts about this subject matter and advice on how to improve the performance.

As to why the timeout is occurring, most probably it is the timeout on the data source you are inserting into, so it would be good to increase the Timeout value in the Yellowfin data source:

/gQFcKeQJAAAAMAp5AgAAAIxCngAAAACjkCcAAADAKFPkiV0AAAAwsY6XclPkCQAAALA05AkAAAAwCnkCAAAAjEKeAAAAAKP+P5f8hvZCnxzCAAAAAElFTkSuQmCCAA==


and if that doesn't help then it also might be the connection to the Yellowfin configuration database that is timing out, in which case you should increase the JDBCTimeout value as described in the following Knowledge Base article:

https://community.yellowfinbi.com/knowledge-base/article/how-to-increase-the-connection-timeout-to-the-yellowfin-database

regards,

David

photo
1

Hello again Timofey,

actually I just looked through your log file and saw the following errors which indicate to me that ClickHouse is locking Yellowfin out, so most likely you've got to configure something in ClickHouse to guard against that occurring again.

YF:2018-07-02 11:52:05:ERROR (DBAction:error) - Error occurred when connecting to the database: java.lang.RuntimeException: ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 210, host: localhost, port: 8123; Connect to localhost:8123 [localhost/127.0.0.1] failed: Connection refused (Connection refused)
java.lang.RuntimeException: ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 210, host: localhost, port: 8123; Connect to localhost:8123 [localhost/127.0.0.1] failed: Connection refused (Connection refused)


YF:2018-07-02 12:29:08:ERROR (DBAction:error) - Error occured selecting data: ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 291, host: prod45.auroraplatform.com, port: 8123; Code: 291, e.displayText() = DB::Exception: Access denied to database events, e.what() = DB::Exception

and also I noticed an out-of-memory error:

YF:2018-06-30 13:47:18:ERROR (ExecutionQueue:error) - Task Failed: TF.First_visit_CH.Registration_E-Sh
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded
which means you should increase your JVM's heap space as described in the following KB article:

https://community.yellowfinbi.com/knowledge-base/article/what-is-jvm-max-memory-and-why-should-i-care

also I saw the following PostgreSQL error, so it looks like I was right when I previously said to increase the data source timeout and the JDBCTimeout (I think you are using PostgreSQL for both a data source and the Yellowfin database):

YF:2018-06-30 04:05:00:ERROR (ETLProcessTask:error) - org.postgresql.util.PSQLException: This connection has been closed.
org.postgresql.util.PSQLException: This connection has been closed.
So, in summary, that's 4 things you should do, increase the JVMs memory, increase the YF data source timeout for PostgreSQL, increase the YF data source timeout for ClickHouse , increase the YF timeout for the connection to the YF DB (JDBCTimeout).

And if there are still issues after doing all of that then please send the latest log file.


regards,

David

photo
1

Hi Timofey,

just checking how you are getting on with this matter, did you get a chance yet to try those 4 suggestions, and if so, how did it go?

regards,

David

photo
1

Hi David,

I have tried those 4 suggestions, but unfortunatly the Trasformation Flow take too much time. So, for our goals, we'he found anouther decision - we connect PostgreSQL database to Clickhouse, compare it there and use standart YellowFin view option.

photo
1

Hi Timofey,

well I'm sorry to hear that the Data Transformation took too much time for you, although congratulations on coming up with another solution - it sounds like a clever and different way around the problem and that's what good I.T. projects are all about!

regards,

David