Only restart of Smart Reporting can get broadcast schedules to run again

Nick shared this question 9 months ago
Answered

Hi,

We are left perplexed with an issue which we are seeing across several sites whereby only a restart of SR can reinstate broadcast schedules after they all stop running.

So far, the behavior that's been observed is based off the logs where all the background queues(5) get filled, and it seems that the 5 tasks that have been already queued never starts/finish to run (see 15160-09-18 for some context)

For the current scenario, however, it seems broadcast schedules stopped from running after an error in the following error in the log (see smartreporting.log.6 before restart):

ERROR (552): The SQL database operation failed.; The incoming request has too many parameters. The server supports a maximum of 2100 parameters. Reduce the number of parameters and resend the request.

The problem is that we cannot tell which report caused the error from the logs and also don't understand why an error like that would cause broadcast schedules to stop from running.

Is there any explanation/reasoning that you could to explain why all broadcasts stop from running and needing a restart?

I have uploaded the logs via the FTP site for your review, before and after restart.

Regards,

Nick

Comments (8)

photo
1

Hey Nick,

There have been issues in the past in relation to Broadcasts and earlier versions. We have created an article about this issue here. It goes into some more detail on how to best prevent this from happening also.

Please let me know if you need anything else.

Thank you,

Paul

photo
1

Hi Paul,

Thanks for the quick response.

The link doesn't seem work from my end.

Regards,

Nick

photo
1

Hi Nick,

I also looked into the error, which looks like an SQL specific error in handling the query being sent to it. Perhaps look at trying to modify the filters or check the query being sent and see how many in statements there are? I saw some info here and here.

Cheers,

Paul

photo
1

Hmm. Seems private for some reason so I have copied and pasted here. Sorry about that.

----

This article applies to the scenario where Yellowfin itself seems to be running ok, except for the background tasks. E.g. Scheduled broadcasts/filters fail to run.

The schedules themselves do not display errors, simply that the schedule was missed. Hitting ‘Run Now’ for the schedule will get it to run.

If Yellowfin is NOT running this article does not apply.

…a bit of info on how the scheduler works.

The scheduler runs scheduled tasks in the background using it’s own thread, and own connection to the data source (if using the secondary connection pool option)

You can have a max of 5 concurrent background threads at once (by default).

The scheduler checks tasks it’s needs to run every 1min.

If a schedule is detected as missed, it will attempt to re-run it. Depending on the frequency. E.g. A failed daily report that did not run today would attempt to run again, until it gets closer to the next run time, at which point it will just wait to run then.

In 7.3 April 2017 + releases, broadcasts were improved so that sending 1 report to multiple people only runs the report once. Unless there is something that changes data based on user login, e.g. Source Filters

The Problem:

Yellowfin background tasks get into a stalled/blocked state. The background queue has items that are simply failing to complete and continue to re-run every min, which hogs the queue.

The cause/s:

  • Out of Memory – In this case, the queue is stuck as something/s in the queue have chewed up available memory to run more reports. In this case you should see out of memory errors in the logs.
  • Query taking too long – In these cases you wouldn’t know this is happening from the UI.
  • Flaky DB connection – In this case, you should see some errors in the logs complaining about such things.
  • Not enough open threads – In this case, you won’t see anything in logs or UI, though may see something in info_threads.

The Solution:

This is based on the cause, and starts with a deep understanding of the schedules that are running. What is running, how often, to who?

Then.

1. Try to work out roughly when the schedule stops

2. What is running around that time.

3. Can you pause some of these broadcasts. By doing so, does the schedule now complete? Or simply take longer before it dies?

4. Focus on those items now in question.. what can we do to space them out, make them run less frequent, report run faster?

Still stuck?Ensure client is running 7.3/7.4 Post Feb 2018. This includes:

  • A finite queue in the recent changes is to stop the server being overrun with waiting tasks, using up all the memory. Therefore tasks are rejected if theres too many things waiting
  • Some extra logging options.

....Want to enable extra debug logging specific to background tasks to delve deeper?

Add the following in log4j.properties.

NOTE: You may want to disable this after you have enough info, so you don't end up with massive log files.

# uncomment to set everything under com.hof to debug

#log4j.category.com.hof=DEBUG

log4j.category.com.hof.mi.util.background.TaskRunner=DEBUG

log4j.category.com.hof.mi.util.background.TaskRunnerCallback=DEBUG

log4j.category.com.hof.mi.util.background.TaskScheduler=DEBUG

log4j.category.com.hof.mi.util.background.TaskTypeUtil=DEBUG

log4j.category.com.hof.mi.util.background.TaskUncaughtExceptionHandler=DEBUG

log4j.category.com.hof.mi.servlet.ReportBroadcastTask=DEBUG

----

photo
1

Hi Paul,

Thanks for that bit of extra information. I will relay this to the user.

Yes, the error itself is a limitation of MS SQL, but we were more so trying to get an understanding of why this error caused the broadcasts to stop from running, even for days (that's we have tasks runnning daily or more frequently).

Thanks again.

Regards,

Nick

photo
1

Hi Nick,

No problemo. Let me know if you need anything else.

Cheers,

Paul

photo
1

Hi Paul,

Thanks for the assistance.

This can be closed off.

Regards,

Nick

photo
1

Hi Nick,


Glad to hear it. I'll mark this as completed.


Cheers,

Neal