Large CSV file hangs on "Processing File" indefinitely

Sam shared this problem 7 months ago
Resolved

I'm trying to import a .csv file that is 640,000 rows x 8 columns. Yellowfin shows a progress bar that resets a few times, then hangs on "Processing File" indefinitely. When I sample the .csv down to 1,000 rows, it imports almost instantly.

Any advice for importing the larger file (65MB)?

Comments (13)

photo
2

Hi Sam,

Thanks for reaching out. We just so happened to have replicated and raised a defect for this yesterday. It was with an 800k row CSV, but same behavior and all works as expected with a smaller file.

Given this information, there are technically two options:

1. Break the CSV into smaller chunks

2. Wait for updates on the logged defect

I've gone ahead and set this ticket to Defect Logged and I'll keep you posted on any updates from the dev team as they come along.

Regards,

Mike

photo
1

Good to know, thanks Mike! I'll split up the CSV and that will work for now, then when this is addressed I'll go back and update with the full data set.

photo
1

Hi Sam,

You're welcome! I'll keep you posted.

Regards,

Mike

photo
1

Followup info: I cut it down to 375k records. At that size, I can get to the stage where I retitle all the columns and specify data types. From there I got this error message:

/7e4caad659ef426aab1b9f7360ba5cfc

I tried re-importing that CSV with High Precision selected, which caused it to error at the start of the process:


Error: java.lang.OutOfMemoryError: Java heap space
At:
java.lang.StringCoding$StringDecoder.decode(StringCoding.java:149)
java.lang.StringCoding.decode(StringCoding.java:193)
java.lang.StringCoding.decode(StringCoding.java:254)
java.lang.String.<init>(String.java:546)
java.lang.String.<init>(String.java:566)
com.hof.util.Base64.encodeFromStream(Base64.java:812)
...

I'll keep trimming it smaller, but wanted to share these additional error behaviors.

photo
1

Hi Sam,

Thanks for keeping us posted. Performing large ETL transformations can indeed use a lot of memory, so what you ran into here is an out of memory error. This makes me believe that even if the larger file were loading as expected, you wouldn't be able to complete the transformation anyway. How much memory is currently assigned to your Yellowfin? You can find this information next to JVM Max Memory under System Information in your info.jsp file, which can be accessed by appending '/info.jsp' to your login URL. For example, localhost:8080/info.jsp. You can also find this info under Administration > System Information.

Here is our minimum server requirements documentation, and here is how to increase JVM Max Memory, if that winds up being needed.

Please let me know.

Thanks,

Mike

photo
1

JVM max memory:910.5 MB

photo
1

Hi Julius,

That is far below the recommended minimum, which is 15GB dedicated to Yellowfin in the application server. Please increase your JVM Max Memory to as close to that value as you can by following the steps in the article I linked previously and please let me know if you still run into this issue when trying to transform the 375k records.

Regards,

Mike

photo
1

Hi Mike - just closing the loop here that once we updated the JVM Max Memory, the 375k records subset was imported successfully. So this is now back to just the original defect identified above related to a larger number of records. Thanks for the help with the memory problem!

photo
2

Thanks for the update Sam!

photo
photo
2

Hi Sam,

Just writing to let you know this Defect has been fixed and will be published in the next build, 9.6. ETA some time next month. I will let you know once published.

Regards,

Mike

photo
1

Hi Sam,

Just writing to let you know that the build containing this fix, 9.6 is now published.

For your convenience, here are a couple direct links for this build:

9.6: .jar / .exe

You can also download the latest full installers here.

Please download the update installer for the latest build and test and let us know if you run into any issues.

Regards,

Mike

photo
1

Thanks Mike. First off, our team and Yellowfin support were never able to get 9.5 to run successfully on our system. And 9.6 appears to work great. That is itself a big help!

In terms of this large CSV bug, in my testing today I find:

  1. 65MB csv, low precision: fails at final step with Java-related error (see below).
  2. 37MB csv, low precision: fails and suggests using high precision.
  3. 65MB and 37MB csvs, high precision: both succeed.

Here's the error message for #1:

Error: java.lang.UnsupportedClassVersionError: com/microsoft/sqlserver/jdbc/SQLServerPreparedStatement has been compiled by a more recent version of the Java Runtime (class file version 53.0), this version of the Java Runtime only recognizes class file versions up to 52.0 (unable to load class [com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement]).


The error message for case #2 is:

The CSV import encountered an error and returned the following message:


java.lang.NumberFormatException


Please address the issue and try importing again. A wide range of issues may be caused by importing the file using Low Precision, try the High Precision option.

photo
1

Hi Sam,

Thanks for your response. Glad to hear 9.6 is working and that this works with High Precision enabled! Based on the nature of the errors specified, it looks like it's expected t hat there may be issues while using Low Precision, but let me get with the dev team to try and determine with more specificity what the difference is between Low and High Precision and also the thought behind offering the Low Precision option in the first place if we explicitly note that "A wide range of issues may be caused by importing the file using Low Precision, try the High Precision option" in an error message.

These are the general differences between the two, for the record:

  • Low Precision is fast, and only examines the first 1000 rows of the file.
  • High Precision is slower, depending on the size of the file, as it examines all rows.

But I'll let you know what they come back with.

Regards,

Mike

photo
1

Hi Sam,

Here's some feedback on your queries:

The Java error is because the Java version of the driver is higher than used by Yellowfin:

Error: java.lang.UnsupportedClassVersionError: com/microsoft/sqlserver/jdbc/SQLServerPreparedStatement has been compiled by a more recent version of the Java Runtime (class file version 53.0), this version of the Java Runtime only recognizes class file versions up to 52.0 (unable to load class [com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement]).

They need to use a higher version of Java to run Yellowfin, or replace their SQL Server driver with one that is compatible with the version of Yellowfin they are using.

The errors with precision make sense.

"java.lang.NumberFormatException

Please address the issue and try importing again. A wide range of issues may be caused by importing the file using Low Precision, try the High Precision option."

Low Precision means that it only samples a small subset of the entire file, to determine the data types of the columns. High Precision means that is scans all the values in the file.

The error is occurring because it cannot convert a value in a column to a numeric value. If high-precision is used, it would have found this non-numeric value, and made the column a Text column rather than a Numeric column. As the column is now a Text column, it won't be attempting the numeric conversion which is cause the error with the low-precision import.

This all comes down to the clients CSV data being dodgy.. If there is a column that holds a number in a column, but they have one row that has text, then Low Precision will miss it, and it will fail. Solution: Make sure the data is not dodgy, or use high-precision scans.

One problem with the high-precision scan (and the presence of dodgy data), is that it might be creating a column as text, when they are assuming that it should be a number.. This might stop the column from being used as a metric, and it will stop them from using the data in charts etc.

Hopefully this clears things up, but please let me know if you have any other questions on this.

Regards,

Mike

photo
1

Hello Mike,


Here is the current version of Java we are using as of our catalina.bat file

set JAVA_HOME=C:\Program Files\Java\jdk1.8.0_291


could you advise? I do not believe we can use a higher version of java here, is there someplace else we can update to get this to work?

photo
1

Hi Julius,

Thanks for your response. That is indeed the most updated Java version, so perhaps they're incorrect on that point. Nevertheless, it still sounds like this behavior would be in the realm of possibility based on the behavior described. That said, considering the other information here surrounding Low vs. High Precision functionality, and that High Precision is confirmed working on your end, is there a reason you can't continue with using High Precision for these larger CSV files?

Regards,

Mike

photo
1

Hi Mike,

We're mixing two problems here, sorry about that. Let's close this as fixed given that the high precision is working and it seems to be a separate problem. Thank you for the explanation of the precision parameter and to the team for patching this in 9.6.

I'll open a separate ticket for the problem we're having with 9.6 and Java.

Thanks,

Sam

photo
1

Hi Sam,

Sounds good. That's a good idea. Thanks for letting me know!

Regards,

Mike

photo