Subquery fields from different view slower to populate in combined report

Jose Rivera shared this problem 6 months ago
Resolved

Hi,

We have a report that consists of data from a view A generated by one data source (a MySQL db) and on top of it we create a subquery that merges A with data from another view (a PostgreSQL database). Effectively this report consists of data from a view + subquery B from another view. When running the reports, or dashboards, the columns corresponding to the subquery B take longer to appear, sometimes hours. And we're concerned that they might just not appear, ever.

On the other hand, If we individually check rows (by filtering some ids) the data corresponding to the subquery are there, but they don't however appear in the "unfiltered" report.

Is anyone aware of this problem? Does it depend on my setup? If required by support we could open a private conversation where I could share more details.

Yellowfin version 7.4.6.

Thanks!!

Comments (3)

photo
1

Hi Jose,

I have just done a basic test in 7.4.6 using a report based on a MySQL data source that also has an Advanced Subquery to a PostgreSQL data source and I didn't notice any performance issues with the columns from the subquery, so at this point of the investigation we can say that it isn't a general bug with Advanced Subqueries, however, we can't say much else unfortunately until we do more diagnostic tests.

The most obvious test would be to test the comparative speeds of both databases, I'm sure you would have already done this, but if not then please be careful not to do the test by running queries directly on the databases. Instead, for a more accurate test you should set up a database query tool (e.g. SQuirreL, DBVisualizer etc.) on the same server that Yellowfin is on and then connect to the 2 databases via JDBC drivers and then run the tests.

Another thing is that I noticed from what you described of the problem that it sounds like the behaviour of the subquery columns is inconsistent, and this reminded me of a ticket I recently worked on in which the subquery's data source was a big transactional database that was constantly changing every minute, and the Yellowfin data source for it had the default row limit of 10,000 rows. And it turned out that this was the cause of their subquery data being different each time, sometimes slower and sometimes not there at all. What was happening was that the sort order of the data returned via the JDBC driver is "native" (for want of better word) to the database, in other words, it isn't sorted by most recent, or alphabetically etc. (I wondered if it might be the physical position in storage, i.e. the contiguous elements of a linked list), and so when Yellowfin grabs the first 10,000 it is going to be a different result set every minute according to how the database orders the results. And then you can imagine how this affects the join between the main query and subquery!

So this is one thing for you to check, is the subquery data source a large transactional database? And do you have any row limits applied within Yellowfin (these may be done at the data source, the view and report level so make sure you check all 3 levels).

Also, for all sorts of performance problems we usually recommend that you take a series of thread snapshots of Yellowfin over a minute or two and then send them to us for analysis as this will show us what is happening in the background of Yellowfin. Thankfully the creation of these thread snapshots is easy due to the handy Yellowfin utility called The InfoThreadRunner, it is very easy to use and I'd recommend that you set the frequency of snapshots to 1 per second and then run it for a couple of minutes as you start running one of the reports with a subquery.

I look forward to hearing back from you!

(by the way, I can convert this Problem into a private ticket if you would like - just let me know)

regards,

David

photo
1

Thanks a lot David, you saved our Success Team from doing manual VLOOKUPs outside Yellowfin because the combined report wouldn't have consistent results.

Indeed the issue was the row limit. But not only at the data source level, but also the limits in the reports that once combined generated the final one via subqueries.

So our final solution was to increase the limit of the data source dbs AND the reports that made up the final subquery.


Thanks again,


Jose

photo
1

That's great news Jose, thanks very much for letting me know!


It is always pleasing for me to learn that I was of some help!

thanks,

David

photo