R step without input possible?

JeRoen shared this question 4 months ago
Answered

Hello,

I want to run a R script that generates some data. The data will be coming from a RDS file (https://www.rdocumentation.org/packages/base/versions/3.5.3/topics/readRDS).

So there is no input step to configure. The first step in the transformation flow is an R script. I do want to write the data to the writable YellowFin database and then report on it but I am not getting that far.

I have Rserve running on localhost and the connection is ok. I have set the path to the correct (and working) R file. Then I set the return field method to append (no input data) and the correct number of columns (9). The input variable name I leave blank and in the output variable name I define outData and that is also the data.frame name the script is returning.

Running it (Apply) gives met 9 fields (Field 0 ... to ... Field 8) but no data.

In the Error tab it says "This step has fields that are not linked to an input."

Is it possible to run a R Script without any input data?

Kind Regards,

Jeroen

Comments (11)

photo
1

Hi JeRoen,

Thanks for reaching out with your question. It's not currently possible to run a transformation flow without an input step. The recommended work around if you don't require one is to use an input step that doesn't have high overhead, such as a freehand SQL statement: "SELECT 1 FROM TintyTable;"

This should allow you to execute the rest of the steps accordingly.

Thanks,

Ryan

photo
1

Hello Ryan,

I am unable to get this to work.

I am working on localhost. I have started a Rserve with the following code in Studio:

library(Rserve)

Rserve(args="--no-save")


It says the daemon is started. The script I use is working correctly in Rstudio.

I have a Transformation Flow in YellowFin containing the following. 0137411315c6b0ab9af8e6a56690ac24

The SQL is like you said a select 1 from a small table.

The R script is configured like this:

7222343390f8304b3174a79e5a5ce60c

The R script contains one row:

outData <- readRDS(file = '/users/xxxx/Downloads/test-set.rds')

The result of the R script is 9 variables and 37500 records and in the R script a variable outData is created. Class(outData) defines it as a "data.frame".

When I apply the above settings then after a few seconds in the tab Error an error shows.

8f506551e9ae887a5bee7278dba514ea


So the SQL fixed the error I was getting "This step has fields that are not linked to an input." but now shows another.

But the one row script is working fine in Rstudio.

Rserve is indeed running on 6311 port.

The preview for all steps is not working on Chrome on Mac. So I created the above images with safari which shows the preview for the steps correctly.

Regards,

Jeroen

photo
1

Hi JeRoen,

Thanks for the update. Please tail the yellowfin.log file during this process and look for a stack trace. Can you kindly provide that here?

If you have trouble finding it I'm happy to assist.

Thanks,

Ryan

photo
1

Hello Ryan,

The YellowFin.log shows the following. You can see the SQL with the name registratie is executed correctly and the troubles begin when R wants to create an dataframe.

When I run the script in Rstudio and then see the class it says it is a dataframe.

> class(outData)

[1] "data.frame"

YellowFin.log

YF:2019-04-29 18:44:36: INFO (AbstractETLCachedStep:call) - Begin processing registratie

YF:2019-04-29 18:44:36: INFO (AbstractETLCachedStep:call) - Finished processing registratie in 289ms; Rows Processed: {1fbd6543-0966-45e3-a6a6-19fa4fded2f1=37}

YF:2019-04-29 18:44:36: INFO (AbstractETLCachedStep:call) - Begin processing R Script

YF:2019-04-29 18:44:36: INFO (RStep:endRows) - Initiating Rserve connection..

YF:2019-04-29 18:44:36: INFO (RStep:endRows) - Testing Rserve connection..

YF:2019-04-29 18:44:36: INFO (RLoader:processEndRows) - Initiating Rserve connection..

YF:2019-04-29 18:44:36: INFO (RLoader:processEndRows) - Testing Rserve connection..

YF:2019-04-29 18:44:36: INFO (RStep:endRows) - Rserve connection established.

YF:2019-04-29 18:44:36:ERROR (RStep:endRows) - Unable to create R dataframe out of provided table

org.rosuda.REngine.Rserve.RserveException: eval failed

at org.rosuda.REngine.Rserve.RConnection.eval(RConnection.java:234)

at rstep.RStep.processEndRows(RStep.java:423)

at com.hof.mi.etl.step.AbstractETLCachedStep.endRows(AbstractETLCachedStep.java:122)

at com.hof.mi.etl.step.AbstractETLCachedStep.D(AbstractETLCachedStep.java:189)

at com.hof.mi.etl.step.AbstractETLCachedStep.endRows(AbstractETLCachedStep.java:125)

at com.hof.mi.etl.runner.ExecutionHead.call(ExecutionHead.java:29)

at com.hof.mi.etl.runner.ExecutionHead.call(ExecutionHead.java:15)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

YF:2019-04-29 18:44:36:ERROR (RStep:endRows) - com.hof.mi.etl.ETLException

YF:2019-04-29 18:44:36:ERROR (AbstractETLCachedStep:call) - Error

com.hof.mi.etl.ETLException

at rstep.RStep.processEndRows(RStep.java:432)

at com.hof.mi.etl.step.AbstractETLCachedStep.endRows(AbstractETLCachedStep.java:122)

at com.hof.mi.etl.step.AbstractETLCachedStep.D(AbstractETLCachedStep.java:189)

at com.hof.mi.etl.step.AbstractETLCachedStep.endRows(AbstractETLCachedStep.java:125)

at com.hof.mi.etl.runner.ExecutionHead.call(ExecutionHead.java:29)

at com.hof.mi.etl.runner.ExecutionHead.call(ExecutionHead.java:15)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)


Regards,

JeRoen

photo
1

Hi JeRoen,

Thanks for providing this. I've forwarded it over to the author of the R step for input and will update you with their reply as able.

Thanks,

Ryan

photo
1

Hi JeRoen,

Apologies for the delay in response. The error you are receiving is caused by either the data or column names passed from the input step. What are you using for this currently, and does this error persist if you try another type of input? It would helpful if you could provide a screenshot of the previewed data for the input step.

Nathan

photo
1

Hello Nathan,

Here is the screenshot with the data result when I do "Run Step" from the menu ...

It is a single text field containing about 32 records.

photo
1

Hi JeRoen,

Thanks for providing this. Nathan is on travel this week on-site with a client, so it may be a short time before you get a response back.

Regards,

Ryan

photo
1

Hi JeRoen,

If you start the r serve instance with the following command we can see if any error was thrown on the R side:


run.Rserve(args="--no-save")

Does this same error occur if you use a different input step such as a direct table read?

Nathan

photo
1

Hello Nathan,

I lost my localhost install of YellowFin today (license expired) and as we have servers set up now we want to work on those servers for our development. But unfortunately on the serverside there is not yet setup an R-server to run scripts against. I did it on my localhost.

So at the moment I have not have an option to recreate anything. And I have imported the data by running the script in RStudio so the immediate need to have this fixed is gone. We will work with an RServer in the mix so we might run into this again when everything is setup but for now we can put this on-hold or close it even if that fits your workflow.

If needed I can always reopen this issue the investigate further.

Sorry for not following up but I will do that later on.

Kind regards,

JeRoen

photo
1

Hi JeRoen,

Thanks for the update. I'll go ahead and mark this as Answered pending your time for further review. Upon which time, please do feel free to re-open this topic and we can pick it up from there.

Thanks,

Ryan