On particularly large and/or long running instances of Yellowfin, the application's configuration database can grow quite large. Why does this happen, and what can be done to bring the size of some of the large tables back down to more reasonable levels? The below information details some of what the large tables are for, and how to more proactively manage them.
Disclaimer: Altering the Yellowfin Configuration Database comes with a large amount of inherent risk. A malformed query has the potential to effectively deconstruct an entire Yellowfin instance, and put into an unusable and unrecoverable state. Make sure that you understand the ramifications of queries below before running them, and the impact they will have on your Yellowfin instance. Always have a up-to-date backups on hand before running queries against the configuration database. Always make sure that Yellowfin is properly shut down before running queries against the configuration database. Contact Support if you have any questions.
Event table
This table stores all YF usage data, such as ; User logins, Running reports, Imports/exports.
This data is all used for auditing only. A full list of what is tracked can be found in the wiki here :
EventArchive table
This table stores all of the archived event data, so the data here is simply data that was moved from the 'Event' table after a specified period.
DocumentData table
This table stores a lot of report related data. Not meta-data, actual data. Such as; Cached report result sets (this includes cached report, and snapshots created at the report level) , Freehand SQL (used for virtual tables, custom query filters, freehand SQL reports) , Cached filter results from older releases (we now store the filter results in different tables), CSV reports
See this article if you want to manually remove cached reports from the table.
You can run the following query to identify what exactly is stored in your DocumentData table by count
select di.DocumentTypeCode, di.DocumentCode, di.StatusCode, di.MediaTypeCode, count(di.DocumentId) as DocumentItemCount, dr.DocumentRevisionCount, dd.DocumentDataCount
from DocumentItem di
left outer join (
select i.DocumentTypeCode, i.DocumentCode, i.StatusCode, i.MediaTypeCode, count(r.RevisionId) as DocumentRevisionCount
from DocumentItem i, DocumentRevision r
where i.DocumentId = r.DocumentId
group by i.DocumentTypeCode, i.DocumentCode, i.StatusCode, i.MediaTypeCode
) dr
on di.DocumentTypeCode = dr.DocumentTypeCode
and di.DocumentCode = dr.DocumentCode
and di.StatusCode = dr.StatusCode
and di.MediaTypeCode = dr.MediaTypeCode
left outer join (
select i.DocumentTypeCode, i.DocumentCode, i.StatusCode, i.MediaTypeCode, count(d.DataSeqNbr) as DocumentDataCount
from DocumentItem i, DocumentRevision r, DocumentData d
where i.DocumentId = r.DocumentId
and r.RevisionId = d.RevisionId
group by i.DocumentTypeCode, i.DocumentCode, i.StatusCode, i.MediaTypeCode
) dd
on di.DocumentTypeCode = dd.DocumentTypeCode
and di.DocumentCode = dd.DocumentCode
and di.StatusCode = dd.StatusCode
and di.MediaTypeCode = dd.MediaTypeCode
group by di.DocumentTypeCode, di.DocumentCode, di.StatusCode, di.MediaTypeCode, dr.DocumentRevisionCount, dd.DocumentDataCount
ReportInstance table
This table stores a record for each report, each time it is run or edited. If you run the same report multiple times, you will get multiple rows for that report in this table.
This is used for things like; Getting 'average report run time' and also used, Using the 'Remember Filter Values' option, Editing a report (the draft copy is a separate record so you can roll back to the original activated version), Results used for KPI reporting.
This table also stores any Bookmarks or Snapshots you may have.
Identifying which tables are contributing to the large DB size
Acquire the following info:
- The top 10 tables in the Yellowfin DB by size
- Results from the query/queries below in CSV format:
select a.CategoryCode,a.SubCategoryCode, COUNT (distinct a.reportid)
from reportheader a, reportinstance b, DocumentItem c
where a.reportid=b.reportid
and a.reportid=c.subjectid
group by a.CategoryCode,a.SubCategoryCode
This query will count the number of reports that are currently cached and in what category they belong.
Please note, it does not count the number of cached copies of that report, this is the main difference with the below sql.
select a.reportid,a.reportname,count (b.ReportInstanceId) as ReportInstance#,a.CategoryCode,a.SubCategoryCode,C.SubjectId
from reportheader a, reportinstance b, DocumentItem c
where a.reportid=b.reportid
and a.reportid=c.subjectid
group by a.CategoryCode,a.SubCategoryCode,a.ReportName,a.reportid, c. SubjectId
This query will return a list of cached reports in the document data table, and count how many reports copies per category. This may take a while to run, and will provide some good info, but not necessary, so feel free to skip if need be.
How to shrink the size of the tables mentioned above:
So how do we shrink the data in these tables? Once the it is understood how and what data that is stored, you need to make the call if you actually want to keep it or not.
PLEASE ensure you have a recent backup of your DB before clearing anything.
Event & EventArchive
All of this can be configured by changing the schedule that the events are archived, and when the archived items are deleted. See the following knowledge base post on how to do this:
https://community.yellowfinbi.com/knowledge-base/a...
Keep in mind, these tables only store usage data, so if you need to audit YF usage, then this data is needed and there is not much you can do, but remove data older than a specific date.
If you want to specifically stop a record being created in the even table each time a report is run, you can use the query below:
INSERT INTO Configuration (IpOrg, ConfigTypeCode, ConfigCode, ConfigData) VALUES (1, 'SYSTEM', 'DISABLEREPORTEVENTS', 'true');
ReportInstance:
The query below should clear the ReportInstance table and leave all KPI reports, cached reports, Bookmarks and Snapshots intact.
DELETE FROM ReportInstance WHERE ReportStatusCode NOT IN ('BOOKMARK','SNAPSHOT', 'BROADCASTBOOKMARK', 'DISTRIBUTEBOOKMARK') AND (DocumentId = 0 OR DocumentId IS NULL OR ReportId not in (SELECT ReportId FROM ReportHeader WHERE RoleCode = 'KPI'));
The query below will stop records being created in ReportInstance and ReportInstanceFilter tables every time a report is run (please remember what this data is for to ensure you want to do this):
INSERT INTO Configuration (IpOrg, ConfigTypeCode, ConfigCode, ConfigData) VALUES (1, 'SYSTEM', 'CREATEREPORTINSTANCES', 'false');
Note: this is separate from the Version History settings found in the Content Folders settings. The 'Current Version Only' setting here only applies to report saves, so report run entries will still be generated each report load/re-load without above query being executed.
DocumentData:
Shrinking this table usually means disabling report caching, which can be done through the UI against each relevant report category. Alternatively, you can modify the record in the DB directly by running the query below.
update ContentManagement
set versionhistoryrequiredflag=0 where contentmanagementid =
To get the right ID, just look at all the records in the table and find the relevant subcategory by name.
Subcategories do NOT have ‘INDEX’ as a ContentCode value. ‘INDEX’ is to mark that it is a parent category.
The Subcategory is called ‘NEW’ and lives under the parent category ‘TESTING’. So the ContentManagementId you want is 55108. Once you have disabled report caching, the cached reports will be removed overnight when the cleanup task runs.
To force the cleanup task to run on next Yellowfin startup, shutdown Yellowfin, then run the query below. After startup up Yellowfin, the task will run within a few minutes, and keep in mind it could take some time to run.
update TaskSchedule
set LastRunDate = null, LastRunGMTDateTime = null, LastRunTime = null, LastRunStatus = null
where ScheduleUnitCode='DocumentCleanupTask'
If you have not managed to shrink the clients DB, or have any quesions, please speak to a senior member of support AFTER you have identified the table sizes and their YF usage (e.g. if they want to use caching, KPI etc..)
If the cleanup task above is failing, you can consider the query below:
If you do not want to remove ALL cached reports, and any KPI (Spot type) reports, please do NOT run the queries below. The following query marks cached reports (not the actual report itself, just the cached results) for delete by setting Abstract column to DELETE.
UPDATE DocumentItem SET Abstract = 'DELETE'
WHERE DocumentCode = 'XMLWRAPPER'
AND DocumentId NOT IN (
SELECT MaxDocumentId FROM (
SELECT ReportId, MAX(DocumentId) as MaxDocumentId
FROM ReportInstance
WHERE ReportId IN (
SELECT ReportId FROM ReportHeader WHERE RoleCode = 'KPI'
)
OR ReportStatusCode = 'SNAPSHOT'
GROUP BY ReportId
) as t
);
This will then delete the items marked as deleted from the DocumentItems table:.
DELETE FROM DocumentItem
WHERE Abstract = 'DELETE";
This will then delete the items marked as deleted from the DocumentRevision table:
DELETE FROM DocumentRevision
WHERE DocumentId NOT IN (SELECT DocumentId FROM DocumentItem);
This will then delete the items marked as deleted from the DocumentData table:
DELETE FROM DocumentData
WHERE RevisionId NOT IN (SELECT RevisionId FROM DocumentRevision);
Other queries they may be useful for cleaning up data;
DELETE FROM DocumentRevision WHERE DocumentId NOT IN (SELECT DocumentId FROM DocumentItem)
Remove orphaned records from DocumentRevisionTable
Delete from DocumentData where RevisionID not in (Select RevisionID from DocumentRevision)
Remove orphaned records from DocumentDataTable
DELETE FROM ReportInstanceFilter
WHERE InstanceId NOT IN (
SELECT ReportInstanceId FROM ReportInstance )
This will remove filters belonging to ReportInstance records that no longer exist
DELETE FROM ReportInstance
WHERE DocumentId != 0 and DocumentId IS NOT NULL AND DocumentId NOT IN (
select DocumentId From DocumentItem)
This will delete orphaned records from ReportInstance
---------------------------------------------------------------
Also, the following queries will safely clean up any orphaned records that for some reason are linked to deleted reports or views:
DELETE FROM ReportFieldParameter WHERE ReportId IN (SELECT ReportId FROM ReportHeader WHERE ReportStatusCode = 'DELETED');
DELETE FROM ReportFieldParameter WHERE ViewId IN (select viewId from reportview where viewstatuscode = 'DELETED');
DELETE FROM ReportFieldTemplate WHERE Viewid IN (select viewid from ReportView where viewstatuscode = 'DELETED');
DELETE FROM ReportFormat WHERE ReportId IN (SELECT REPORTID FROM REPORTHeader WHERE ReportStatusCode = 'DELETED');
DELETE FROM CachedFilterValue WHERE CachedFilterId in (select CachedFilterId from CachedFilter where reportid in (SELECT REPORTID FROM REPORTHeader WHERE ReportStatusCode = 'DELETED'));
DELETE FROM ReportInstance WHERE ReportId IN (SELECT REPORTID FROM REPORTHeader WHERE ReportStatusCode = 'DELETED");