Why is my Yellowfin Database so large, and what can I do about it?

On particularly large and/or long running instances of Yellowfin, the application's configuration database can grow quite large. Why does this happen, and what can be done to bring the size of some of the large tables back down to more reasonable levels? The below information details some of what the large tables are for, and how to more proactively manage them.

Disclaimer: Altering the Yellowfin Configuration Database comes with a large amount of inherent risk. A malformed query has the potential to effectively deconstruct an entire Yellowfin instance, and put into an unusable and unrecoverable state. Make sure that you understand the ramifications of queries below before running them, and the impact they will have on your Yellowfin instance. Always have a up-to-date backups on hand before running queries against the configuration database. Always make sure that Yellowfin is properly shut down before running queries against the configuration database. Contact Support if you have any questions.


Event table

This table stores all YF usage data, such as ; User logins, Running reports, Imports/exports.

This data is all used for auditing only. A full list of what is tracked can be found in the wiki here :

Report Audit and Tracking

EventArchive table

This table stores all of the archived event data, so the data here is simply data that was moved from the 'Event' table after a specified period.

DocumentData table

This table stores a lot of report related data. Not meta-data, actual data. Such as; Cached report result sets (this includes cached report, and snapshots created at the report level) , Freehand SQL (used for virtual tables, custom query filters, freehand SQL reports) , Cached filter results from older releases (we now store the filter results in different tables), CSV reports
See this article if you want to manually remove cached reports from the table.

ReportInstance table

This table stores a record for each report, each time it is run or edited. If you run the same report multiple times, you will get multiple rows for that report in this table.

This is used for things like; Getting 'average report run time' and also used, Using the 'Remember Filter Values' option, Editing a report (the draft copy is a seperate record so you can roll back to the original activated version), Results used for KPI reporting

Identifying which tables are contributing to the large DB size

Acquire the following info:

  • The top 10 tables in the Yellowfin DB by size
  • Results from the query/queries below in CSV format:

select a.CategoryCode,a.SubCategoryCode, COUNT (distinct a.reportid)

from reportheader a, reportinstance b, DocumentItem c

where a.reportid=b.reportid

and a.reportid=c.subjectid

group by a.CategoryCode,a.SubCategoryCode

This query will count the number of reports that are currently cached and in what category they belong.

Please note, it does not count the number of cached copies of that report, this is the main difference with the below sql.

select a.reportid,a.reportname,count (b.ReportInstanceId) as ReportInstance#,a.CategoryCode,a.SubCategoryCode,C.SubjectId

from reportheader a, reportinstance b, DocumentItem c

where a.reportid=b.reportid

and a.reportid=c.subjectid

group by a.CategoryCode,a.SubCategoryCode,a.ReportName,a.reportid, c. SubjectId

This query will return a list of cached reports in the document data table, and count how many reports copies per category. This may take a while to run, and will provide some good info, but not necessary, so feel free to skip if need be.

How to shrink the size of the tables mentioned above:

So how do we shrink the data in these tables? Once the it is understood how and what data that is stored, you need to make the call if you actually want to keep it or not.

PLEASE ensure you have a recent backup of your DB before clearing anything.

Event & EventArchive

All of this can be configured by changing the schedule that the events are archived, and when the archived items are deleted. See the following knowledge base post on how to do this: 

https://community.yellowfinbi.com/knowledge-base/article/how-to-limit-the-size-of-the-event-table_1

Keep in mind, these tables only store usage data, so if you need to audit YF usage, then this data is needed and there is not much you can do, but remove data older than a specific date.

If you want to specifically stop a record being created in the even table each time a report is run, you can use the query below:

INSERT INTO Configuration (IpOrg, ConfigTypeCode, ConfigCode, ConfigData) VALUES (1, 'SYSTEM', 'DISABLEREPORTEVENTS', 'true');

ReportInstance:

The query below should clear the ReportInstance table and leave all KPI reports & cached reports intact.

DELETE FROM ReportInstance WHERE DocumentId = 0 OR DocumentId IS NULL OR ReportId not in (SELECT ReportId FROM ReportHeader WHERE RoleCode = 'KPI')

This will stop records being created in ReportInstance and ReportInstanceFilter tables every time a report is run (please remember what this data is for to ensure you want to do this):

INSERT INTO Configuration (IpOrg, ConfigTypeCode, ConfigCode, ConfigData) VALUES (1, 'SYSTEM', 'CREATEREPORTINSTANCES', 'false');

DocumentData:

Shrinking this table usually means disabling report caching, which can be done through the UI against each relevant report category. Alternatively, you can modify the record in the DB directly by running the query below.

update ContentManagement

set versionhistoryrequiredflag=0 where contentmanagementid =

To get the right ID, just look at all the records in the table and find the relevant subcategory by name.

Subcategories do NOT have ‘INDEX’ as a ContentCode value. ‘INDEX’ is to mark that it is a parent category.

The Subcategory is called ‘NEW’ and lives under the parent category ‘TESTING’. So the ContentManagementId you want is 55108. Once you have disabled report caching, the cached reports will be removed overnight when the cleanup task runs.

To force the cleanup task to run on next Yellowfin startup, shutdown Yellowfin, then run the query below. After startup up Yellowfin, the task will run within a few minutes, and keep in mind it could take some time to run.

update TaskSchedule

set LastRunDate = null, LastRunGMTDateTime = null, LastRunTime = null, LastRunStatus = null

where ScheduleUnitCode='DocumentCleanupTask'

If you have not managed to shrink the clients DB, or have any quesions, please speak to a senior member of support AFTER you have identified the table sizes and their YF usage (e.g. if they want to use caching, KPI etc..)

If the cleanup task above is failing, you can consider the query below:

If you do not want to remove ALL cached reports, and any KPI (Spot type) reports, please do NOT run the queries below. The following query marks cached reports (not the actual report itself, just the cached results) for delete by setting Abstract column to DELETE.

UPDATE DocumentItem SET Abstract = 'DELETE'
WHERE DocumentCode = 'XMLWRAPPER'
AND DocumentId NOT IN (
SELECT MaxDocumentId FROM (
SELECT ReportId, MAX(DocumentId) as MaxDocumentId
FROM ReportInstance
WHERE ReportId IN (
SELECT ReportId FROM ReportHeader WHERE RoleCode = 'KPI'
)
OR ReportStatusCode = 'SNAPSHOT'
GROUP BY ReportId
) as t
);

This will then delete the items marked as deleted from the DocumentItems table:.

DELETE FROM DocumentItem
WHERE Abstract = 'DELETE";

This will then delete the items marked as deleted from the DocumentRevision table:

DELETE FROM DocumentRevision
WHERE DocumentId NOT IN (SELECT DocumentId FROM DocumentItem);

This will then delete the items marked as deleted from the DocumentData table:

DELETE FROM DocumentData
WHERE RevisionId NOT IN (SELECT RevisionId FROM DocumentRevision);

Other queries they may be useful for cleaning up data;

DELETE FROM DocumentRevision WHERE DocumentId NOT IN (SELECT DocumentId FROM DocumentItem)

Remove orphaned records from DocumentRevisionTable

Delete from DocumentData where RevisionID not in (Select RevisionID from DocumentRevision)

Remove orphaned records from DocumentDataTable

DELETE FROM ReportInstanceFilter

WHERE InstanceId NOT IN (

SELECT ReportInstanceId FROM ReportInstance )

This will remove filters belonging to ReportInstance records that no longer exist

DELETE FROM ReportInstance

WHERE DocumentId != 0 and DocumentId IS NOT NULL AND DocumentId NOT IN (

select DocumentId From DocumentItem)

This will delete orphaned records from ReportInstance

---------------------------------------------------------------

Also, the following queries will safely clean up any orphaned records that for some reason are linked to deleted reports or views:

DELETE FROM ReportFieldParameter WHERE ReportId IN (SELECT ReportId FROM ReportHeader WHERE ReportStatusCode = 'DELETED');

DELETE FROM ReportFieldParameter WHERE ViewId IN (select viewId from reportview where viewstatuscode = 'DELETED'); 

DELETE FROM ReportFieldTemplate WHERE Viewid IN (select viewid from ReportView where viewstatuscode = 'DELETED');

DELETE FROM ReportFormat WHERE ReportId IN (SELECT REPORTID FROM REPORTHeader WHERE ReportStatusCode = 'DELETED');

DELETE FROM CachedFilterValue WHERE CachedFilterId in (select CachedFilterId from CachedFilter where reportid in (SELECT REPORTID FROM REPORTHeader WHERE ReportStatusCode = 'DELETED'));

DELETE FROM ReportInstance WHERE ReportId IN (SELECT REPORTID FROM REPORTHeader WHERE ReportStatusCode = 'DELETED");

Is article helpful?