Advanced tools for improving performance with Redshift et al
Lately we've been doing some performance testing with a migration to Amazon Redshift, and what we've found is that Yellowfin doesn't allow us to get the most out of it performance wise, due to limitations in functionality. I'll demonstrate with some examples.
So the naive way to model and report on a data model such as the following, will produce SQL as follows:
select data_table.data_col from data_table join date_to_data on date_to_data.data_id = data_table.data_id join date_time on date_time.date_time_key = date_to_data.date_time_key where date_time.start_date between @start_time and @end_time
You'll notice we really want to filter by date here.
However, the problem here is that to maximise the performance with Redshift, we want to first explicitly restrict the range of the lookup on data_table (which is presumed to be big) to the minimum possible. Ideally, the table date_to_data should be pre-filtered as well.This is because unlike with a traditional DBMS, Redshift will avoid nested loops at all costs because it doesn't scale across nodes.
So the optimal query here needs to become something more like the following, which we can achieve with a more sophisticated approach:
with dates as ( select min(date_time_key) as min_date_time_key, max(date_time_key) as max_date_time_key from date_time where start_date between @start_time and @end_time ) ,ranges as ( select min(data_id) as min_id, max(data_id) as max_id, from date_to_data where date_time_key between (select min_date_time_key from dates) and (select max_date_time_key from dates) ) select data_table.data_col from data_table where data_id between (select min_id from ranges) and (select max_id from ranges)
However, it seems that with Yellowfin's modelling tools, there's currently no way to achieve this approach - even using custom SQL tables.You could maybe do it with stored procedures, but that leaves you with a host of other limitations compared to plain modelling.
Can the Yellowfin team please advice is there is any plans to add this kind of more sophisticated capability into the platform? If not, I'd like to suggest it.