Client Org row limit on Advanced Subqueries when using Data Source Substitution

Brendan Codrington shared this question 6 months ago
Answered

Hi,

This is an addendum to https://community.yellowfinbi.com/ticket/15717, where row limits were being applied across subqueries.

We've now applied the patch (and more), and it's worked for simple sub-queries, however the issue is still occurring on advanced subqueries using a nominally 'different' source, even though - importantly - that different source is actually using the same database as the master query. (It's a source in the parent org, querying back into the client org's database.)

The row limit on the master is applied to the subquery, even if both the master source and subquery source have no row limits on them (and it is only set in the master query).

I've ensured the client org and parent org sources both use the same type of connection (Microsoft SQL server) and driver.

Is this expected, because it is querying via a different source to the same database, or is this a defect?

Thanks,

Brendan

Comments (16)

photo
1

Hi Brendan,

Thanks for following that up!

Would you be able to check a couple of things for me?

  1. In the Views where the subqueries are being limited, could you go into View Options > Performance and let me know if there are any limits implemented there? (Screenshot 1)
  2. In the Data Source where the subqueries are being limited, could you please go into Admin Console > Data Source > Usage Parameters and let me know if there are any limits implemented there? (Screenshot 2)

I suspect that one of these settings might be overwriting the other and causing your queries to come back limited.


Let me know how you go.


Kind regards,

Simon

photo
1

Hi Simon,

Thanks for the quick response. There is currently no row limit on the subquery source, and the view row limit is set to 0 (if I clear this and save it, it reverts back to 0). Both screenshots attached. The limit in the original screenshot above was set on the master query (but replicates what also happens when there is a limit at the data source level).

Does that help at all?

Kind regards,

Brendan

photo
1

Hi Brendan,


Maybe try changing the row limit on the View to a value: 2,000,000 or something just to be sure. Is this the same for both the Data Sources that you are querying?


Kind regards,

Simon

photo
1

Hey Simon, yes currently both data sources have max row limit turned off and the row limit is being driven from the master query. When I apply a row limit to the subquery's view as suggested the subquery row limit still matches the master query unless the subquery's view row limit is less than the master row limit. I hope that makes sense? So it is matching the master (or the master's source) unless the subquery view row limit is less.


Edit - and yeah this occurs both ways from either data source to the other, with row limits on the data sources or just the master query.

Brendan

photo
1

Hi Brendan,


Thank you so much for sending through that information.


Would you mind clarifying what the row limit in the Master Query is? Is it also 50,000?


In order for me to understand your use case a little better, would you mind explaining what kind of report you are trying to create, maybe with screenshots where necessary? I'm having a little difficulty conceptualising what you are aiming for, where my instincts are telling me there might be either an explanation, or a better way of achieving the results you are after.


Kind regards,

Simon

photo
1

Hey Simon,

In the original screenshot provided (Advanced subquery row limit.jpg) yes the master query row limit was set to 50,000 and reflected in the subquery. I'll try and give a detailed answer to your other question (apologies if i overexplain!).

The reason for this type of report is that the client org has custom views with a mix of standard system fields and their own custom fields, set up in a simplified manner. In contrast, the views in the parent org contain only standard system fields but allow a greater range of views and complexity of fields that are managed centrally ie can be updated once not 100+ times at each client org level. It also allows report templates to be managed centrally and available for customisation.

As a user, I set up a report using my custom view and data, but find that in this instance I need some specialised fields. I subquery by a key field to pick up that additional data for matching records in the relevant parent org view. My master query is to my database using my custom source; my append subquery is also to my database but via the standard parent source.

[Just to confirm - expected behaviour is that while the master query/source has a row limit on overall results, the subquery can search all results for matches and then return those, however it is currently limiting the search to the master's row limit and then returning matches, thus omitting many matches that exist outside of that result set.]

Does that answer your question and help give any insights?

Kind regards,

Brendan

photo
1

Hi Brendan,


Thanks for that explanation, I think I understand a little better now.


It sounds like the behaviour is as expected, the reason being that the Data Sources are the same between the Client Org and the Parent Org, is that correct? The Parent Org/Client Org relationship and indeed the Report Row Limit > View Row Limit > Data Source Row Limit hierarchy both apply here, where the row limit for the Data Source in the Parent Org is applying both the Master Query and the Subquery.


I think the solution here is to free up the Row Limit restrictions from your Parent Org Data Source, and start implementing Row Limits at individual Views instead if you are concerned about performance.


I'd be curious to know why the Row Limits are set in the Parent Org Data Source in the first place, did you have concerns with performance?


Kind regards,

Simon

photo
1

Hi Simon, I'm a bit confused then - I thought that querying the same database between the master and subquery meant that the row limit (where ever it is set) is applied to the whole result set, not each individual query.

Row limits have been used to improve performance, particularly as the granularity of some data can mean a build up of a large amount of data over time, particularly with process timestamps per process step per application per job. Controlling it at the source level (both the parent org and the client org sources) has been a universal approach to provide a limit that can be adjusted per client when needed.

Even if row limits are applied at the view level as suggested, it still wouldn't address that when I'm finding a match in the subquery I need to match from the whole database then limit the report's overall results to the specified limit (of the master query, which could be inherited from its view or source), not limit the database results then find matches. While I don't want to limit the subquery (regardless of which source or view it is using) I will want to limit the overall report's results - does that make sense?

Kind regards,

Brendan

photo
1

Hi Brendan,


I understand where your confusion comes from, please bear in mind that when you are creating an Append Report, the Master Query and the Subquery are two separate queries, within one transaction. This means that the constraints put in place by the Data Source (LIMIT = 50000), will apply to both queries within the transaction.


So bearing the above in mind, I believe the work around at this stage would still be to remove the Row Limit on your Default Data Source.


I'm going to have a quick chat to one of our Developers and understand if this would be considered a defect, which it very may well be. They also might have a better idea of a more viable work around in the meantime. I can definitely see the benefit of the configuration of the Client Data Source substitution overriding the Parent Data Source, for Subqueries.


Please leave this with me, and I will get back to you once I have more information.


In the meantime, please don't hesitate to reach out if you have any further questions!


Kind regards,

Simon

photo
1

Hi Brendan,


I've just spoken to one of our Developers, who has confirmed that this behaviour is expected, and not a defect.


The reasons that the Parent Org might wish to limit the number of rows returned are exactly the reasons you've mentioned before relating to performance optimisation. It doesn't make sense for the Client Org to be able to override these administrative measures put in place.


One suggestion was to increase the Parent Data Source Row Limit to a higher figure, to something that would capture the results of the report that you were trying to create.


Another was to use the Link to Filter operator to filter both the Master Query and the Subquery at the same time, reducing the number of rows returned in both queries in order to get the correct output. This should optimise the number of rows returned by the query.


Let me know if this explanation makes sense, and let me know how you go with creating the report.


Kind regards,

Simon

photo
1

Hi Simon,

Thanks for checking with the developers on this. Based on this, I've spent some time trying different combos and its effect on including/omitting results and on performance. (Performance with no row limit has been hard to check given current NBN challenges, but it has had an impact.)

The challenge in this scenario will be when the fields used to filter the master query are specific to a child org and not in the subquery's parent org view - part of the very logic as to why they are split between child/parent views/sources. Shared standard fields could limit the result set to search through but it may not reduce it to a complete set that is below the row limit.

But perhaps the more puzzling aspect is that if the subquery is meant to operate independently, why does the master query's row limit override the subquery's source/view row limit but only when it is lower? I.e.:

  • if I have the master at 50,000, and a higher (or no) limit on the subquery's source, the subquery will also get limited to 50,000 and omit joins that I'm after.
  • conversely, if I have the master at 50,000, and a lower limit on the subquery's source, the subquery will get limited to that lower limit and ignore the master.

If the row limits are different, wouldn't it then be consistent for the subquery to be able to search the database (to its own limit) for joins and then be limited in line with the master after matches are made? Otherwise, having different limits only works on single or simple queries. And setting a higher row limit on the parent source won't change the behaviour for our reports that subquery from the client org to the parent source.

Appreciate your time assisting with this - I really want to make sure I understand the ins and outs as this behaviour is a core component of our analytics design.

Thanks,

Brendan

photo
1

Hi Brendan,


It's no problem at all, I'm happy to walk through this with you - I'm learning quite a bit as well!


The way I was explained, is that when a Client Org Data Source is defined using Data Source Substitution, it is NOT a separate Data Source, rather it is a View that lives with the Parent Org, underneath the Parent Data Source. This means that the Data Source > View > Report, Row Limit logic hierarchy applies. In the Configuration Database, it would be a View with the IpOrg value of '1'.


Is there a reason why you wouldn't be able to use the same filter fields to the Parent Data Source, you should have the same schema definition for the Parent and Client Org, or am I missing something here? For example enabling a particular field in the Parent View when you are creating an Advanced Subquery in the Client Org that requires it?


It might be helpful to get a database schema so I can follow along, it's a little tricky to understand what your restrictions might be without some sort of visual aid! If you would like to send it, you can upload to https://ftp.yellowfin.bi and let me know the name of the file.


Kind regards,

Simon

photo
1

Hey Simon,

The key difference in the parent org vs client org schema is that the client org databases also include additional tables with custom fields in them that are relevant only to that specific client org. So there's a whole range of standard tables using standard fields, processes and workflows from our system that are found in all client databases (and thus can also be used in the parent org views) but clients also have their own fields plus unique custom tables that can only be added to their client org views.

Those custom tables are joined by and placed in context by the standard tables in the custom views, and usually contain the key fields and filters used to drive each client's reports and metrics. For instance, job data (standard table) is linked to application data (standard table), but each client will also have custom job data (linked to the standard job data) and custom application data (linked to the standard application data).

And so it's a delicate balance of having custom views available in the client org with these custom fields included per client, and having a centralised set of views (and reports built on them) available via the parent org (with data source substitution). These centralised views are also more specific and advanced in nature; the custom views more general and simple to help our more occasional users (and another reason for the mix of standard and custom data in them).

The nexus that I'm refining now is when they need to be combined in order to mix that custom data with the advanced standard views when the clients have years and years of historical data that could be matched.

If you need more info on this, what would be most useful - example view SQL text perhaps?

Kind regards,

Brendan

photo
1

Hi Brendan,


Thanks for the explanation, I'm understanding your use case a little better now and I do understand how the current Yellowfin configuration doesn't lend itself to this particular scenario.


What I might suggest, is that I put you in contact with one of our consultants who might be able to provide a better solution than I might be able to come up with. What I understand from what you've told me, and the conversation that I've had with our Developers, is that your particular scenario accounts for a very specific use case, and a change as an Enhancement to the Client Org/ Parent Org Data Substitution hierarchy would be difficult to deploy to the wider Yellowfin Community.


Does this sound OK to you?


Kind regards,

Simon

photo
1

Hi Simon,

Yes please, if a consultant might have some advice for our scenario that would be great. Thanks again for getting across the situation as it is for now.

Kind regards,

Brendan

photo
1

Hi Brendan,


I've passed this message on to your CSM who will be in contact with you shortly.


In the meantime, I will mark this ticket as completed, however, please don't hesitate to reach out if you have any further queries that I might be able to help with from a Support perspective! Please also let me know if you are yet to hear from your CSM and I'll be sure to follow that up for you.


Kind regards,

Simon