A client had a Yellowfin database that was huge and they wanted to reduce the size of the larger tables by only keeping the 5 most recent versions of reports.
However, because their DB was so huge, the normal queries we give to reduce the ReportInstance and DocumentData tables were taking too long to run, so we came up with the following solution (it's for SQL Server, but the key features of it, i.e. using a TEMP TABLE and also using the ROW_NUMBER feature, are available in the other well-known databases )
This 1st query sets up all likely candidate DocumentIds into a TempTable:
SELECT ri.DocumentId, ri.ReportId, ri.ReportGMTDateTime into ##TempTable FROM ContentManagement cm, ReportHeader rh, ReportInstance ri WHERE cm.ContentEntityCode = 'REPORT' AND cm.ContentCode = rh.SubCategoryCode AND rh.ReportId = ri.ReportId AND ri.DocumentId IS NOT NULL AND ri.DocumentId != 0 AND ri.RequestTypeCode != 'COMMENT'
and then the next query selects only those that are not the most 5 recent versions
WITH TOPFIVE AS ( SELECT *, ROW_NUMBER() over ( PARTITION BY [ReportId] order by [ReportGMTDateTime] ) AS RowNo FROM ##TempTable ) SELECT * FROM TOPFIVE WHERE RowNo > 5
So obviously once you've got your list of DocumentId values from the above query, then it's up to you to join them to other tables that you would like to clean up.