This article is for users who are in the process of setting up their first Yellowfin Signal or want some tips on trouble-shooting existing Signals. For anything else Signal related, please check our wiki over here.
This article will provide a brief overview of how to best prep your view for signal use, some common questions, and some tips for trouble-shooting.
How does Yellowfin Signals work it's magic?
Firstly, you need a view to base your Signals off. You choose your date, metric and dimension fields, and then all of this data is pulled from that view and stored in memory, where it will be used for the magic of signals. We can think of this data set as the baseline for signals. Data outside of this view cannot be used.
So now that we have a baseline dataset in memory, the real magic starts to happen.
Sub-tasks are then created which will generate their own data set (based on the baseline) and store these in memory. These subtasks will be displayed in Scheduled Management. The number of sub-tasks generated is based on a few factors;
-Row level security (e.g Source Filters, Data source substitution)
Analysis is then performed on each sub-task's dataset, and when Signals are detected the data for that Signal is cached in ReportInstance (unless Signals caching has been turned off, in which case it will have to hit the data set directly). The metadata about the signal is also saved to the database, along with some extra data fragments stored for some Signal types.
The data for each sub-task is held in memory until analysis for that sub-task is completed, at which point that memory is freed. There is no overlap between sub-tasks for this data, so it is not shared.
Once all sub-tasks are completed, the baseline data set is removed from memory.
Things to look out for and additional info
The following will go over some best practices, and additional info which can assist with trouble-shooting any issues.
Set an appropriate amount of memory for Java.
As the baseline data is stored in memory, it's best you understand what your view is returning, and if it's necessary. For example do you have data that should be filtered out?
Are you using timestamps and returning rows for every second, though only want to see signals based on month periods?
Do you have a large number of unique dimension values?
Source filters and other items that would create row level security.
All of this is fine if you have enough memory to run all these jobs, but if you don't maybe you should be looking at aggregating the data set, or clustering.
Should you be aggregating dates at the view or signal level?
This ties directly in with the above.
If you have a timestamp and returning unique rows for every second or hour, and using days/months/years as the signal date period, that's a lot of extra data that is stored in memory, only to then be aggregated before use. To same some on some memory you could do the date aggregation on the view itself.
Using calc fields with date aggregations
If you have calc fields and also doing in-memory date aggregations, the results returned from the calc field will not be accurate.
In cases like this it's best to look at doing your date aggregations on the view.
Signals are historic.
If a Signal was generated, it means the event has already happened.
Signal caching is on by default, and shouldn't need to be turned off.
If Signal caching is turned off, it means that for each Signal opened, Yellowfin needs to pull the metadata to run the query again against the actual data set in the DB, which will always be slower.
The number of sub-tasks cannot be directly controlled.
This is all generated based on data and config settings. You can however see how many sub-tasks a Signal job has created, by checking the task in Schedule Management
Sub-tasks run just like any other scheduled job.
The number of tasks, and times they're run is all based on your Signal job. So while you cannot change much about the sub-tasks, what you can do is configure how many scheduled jobs can run at once. By default Yellowfin will run 5 scheduled tasks at once, though it can be modified by modifying your web.xml. Please reach out to support if you are wanting to make changes to this.
This also means you can cluster Yellowfin and have only 1 node run these background Signal jobs.
If ReportInstance table is too large.
Make sure that it's related to signals, and if so have a look at turning off the Signal caching option. Keep in mind this will impact the performance of opening a signal. Please reach out to support if you are wanting to make changes to this.
Signals are stored just like other YF user created content.
When a Signal configuration is deleted, it removes the task schedule job, though leaves the other records in the tables, just flagged as deleted.
If you have any additional questions, or just want to leave some feedback on Yellowfin Signals, please let us know.