Large CSV file hangs on "Processing File" indefinitely
Resolved
I'm trying to import a .csv file that is 640,000 rows x 8 columns. Yellowfin shows a progress bar that resets a few times, then hangs on "Processing File" indefinitely. When I sample the .csv down to 1,000 rows, it imports almost instantly.
Any advice for importing the larger file (65MB)?
Hi Sam,
Thanks for reaching out. We just so happened to have replicated and raised a defect for this yesterday. It was with an 800k row CSV, but same behavior and all works as expected with a smaller file.
Given this information, there are technically two options:
1. Break the CSV into smaller chunks
2. Wait for updates on the logged defect
I've gone ahead and set this ticket to Defect Logged and I'll keep you posted on any updates from the dev team as they come along.
Regards,
Mike
Hi Sam,
Thanks for reaching out. We just so happened to have replicated and raised a defect for this yesterday. It was with an 800k row CSV, but same behavior and all works as expected with a smaller file.
Given this information, there are technically two options:
1. Break the CSV into smaller chunks
2. Wait for updates on the logged defect
I've gone ahead and set this ticket to Defect Logged and I'll keep you posted on any updates from the dev team as they come along.
Regards,
Mike
Good to know, thanks Mike! I'll split up the CSV and that will work for now, then when this is addressed I'll go back and update with the full data set.
Good to know, thanks Mike! I'll split up the CSV and that will work for now, then when this is addressed I'll go back and update with the full data set.
Hi Sam,
You're welcome! I'll keep you posted.
Regards,
Mike
Hi Sam,
You're welcome! I'll keep you posted.
Regards,
Mike
Followup info: I cut it down to 375k records. At that size, I can get to the stage where I retitle all the columns and specify data types. From there I got this error message:
I tried re-importing that CSV with High Precision selected, which caused it to error at the start of the process:
I'll keep trimming it smaller, but wanted to share these additional error behaviors.
Followup info: I cut it down to 375k records. At that size, I can get to the stage where I retitle all the columns and specify data types. From there I got this error message:
I tried re-importing that CSV with High Precision selected, which caused it to error at the start of the process:
I'll keep trimming it smaller, but wanted to share these additional error behaviors.
Hi Sam,
Thanks for keeping us posted. Performing large ETL transformations can indeed use a lot of memory, so what you ran into here is an out of memory error. This makes me believe that even if the larger file were loading as expected, you wouldn't be able to complete the transformation anyway. How much memory is currently assigned to your Yellowfin? You can find this information next to JVM Max Memory under System Information in your info.jsp file, which can be accessed by appending '/info.jsp' to your login URL. For example, localhost:8080/info.jsp. You can also find this info under Administration > System Information.
Here is our minimum server requirements documentation, and here is how to increase JVM Max Memory, if that winds up being needed.
Please let me know.
Thanks,
Mike
Hi Sam,
Thanks for keeping us posted. Performing large ETL transformations can indeed use a lot of memory, so what you ran into here is an out of memory error. This makes me believe that even if the larger file were loading as expected, you wouldn't be able to complete the transformation anyway. How much memory is currently assigned to your Yellowfin? You can find this information next to JVM Max Memory under System Information in your info.jsp file, which can be accessed by appending '/info.jsp' to your login URL. For example, localhost:8080/info.jsp. You can also find this info under Administration > System Information.
Here is our minimum server requirements documentation, and here is how to increase JVM Max Memory, if that winds up being needed.
Please let me know.
Thanks,
Mike
Hi Julius,
That is far below the recommended minimum, which is 15GB dedicated to Yellowfin in the application server. Please increase your JVM Max Memory to as close to that value as you can by following the steps in the article I linked previously and please let me know if you still run into this issue when trying to transform the 375k records.
Regards,
Mike
Hi Julius,
That is far below the recommended minimum, which is 15GB dedicated to Yellowfin in the application server. Please increase your JVM Max Memory to as close to that value as you can by following the steps in the article I linked previously and please let me know if you still run into this issue when trying to transform the 375k records.
Regards,
Mike
Hi Mike - just closing the loop here that once we updated the JVM Max Memory, the 375k records subset was imported successfully. So this is now back to just the original defect identified above related to a larger number of records. Thanks for the help with the memory problem!
Hi Mike - just closing the loop here that once we updated the JVM Max Memory, the 375k records subset was imported successfully. So this is now back to just the original defect identified above related to a larger number of records. Thanks for the help with the memory problem!
Hi Sam,
Just writing to let you know this Defect has been fixed and will be published in the next build, 9.6. ETA some time next month. I will let you know once published.
Regards,
Mike
Hi Sam,
Just writing to let you know this Defect has been fixed and will be published in the next build, 9.6. ETA some time next month. I will let you know once published.
Regards,
Mike
Hi Sam,
Just writing to let you know that the build containing this fix, 9.6 is now published.
For your convenience, here are a couple direct links for this build:
9.6: .jar / .exe
You can also download the latest full installers here.
Please download the update installer for the latest build and test and let us know if you run into any issues.
Regards,
Mike
Hi Sam,
Just writing to let you know that the build containing this fix, 9.6 is now published.
For your convenience, here are a couple direct links for this build:
9.6: .jar / .exe
You can also download the latest full installers here.
Please download the update installer for the latest build and test and let us know if you run into any issues.
Regards,
Mike
Thanks Mike. First off, our team and Yellowfin support were never able to get 9.5 to run successfully on our system. And 9.6 appears to work great. That is itself a big help!
In terms of this large CSV bug, in my testing today I find:
Here's the error message for #1:
Error: java.lang.UnsupportedClassVersionError: com/microsoft/sqlserver/jdbc/SQLServerPreparedStatement has been compiled by a more recent version of the Java Runtime (class file version 53.0), this version of the Java Runtime only recognizes class file versions up to 52.0 (unable to load class [com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement]).
The error message for case #2 is:
The CSV import encountered an error and returned the following message:
java.lang.NumberFormatException
Please address the issue and try importing again. A wide range of issues may be caused by importing the file using Low Precision, try the High Precision option.
Thanks Mike. First off, our team and Yellowfin support were never able to get 9.5 to run successfully on our system. And 9.6 appears to work great. That is itself a big help!
In terms of this large CSV bug, in my testing today I find:
Here's the error message for #1:
Error: java.lang.UnsupportedClassVersionError: com/microsoft/sqlserver/jdbc/SQLServerPreparedStatement has been compiled by a more recent version of the Java Runtime (class file version 53.0), this version of the Java Runtime only recognizes class file versions up to 52.0 (unable to load class [com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement]).
The error message for case #2 is:
The CSV import encountered an error and returned the following message:
java.lang.NumberFormatException
Please address the issue and try importing again. A wide range of issues may be caused by importing the file using Low Precision, try the High Precision option.
Hi Sam,
Thanks for your response. Glad to hear 9.6 is working and that this works with High Precision enabled! Based on the nature of the errors specified, it looks like it's expected t hat there may be issues while using Low Precision, but let me get with the dev team to try and determine with more specificity what the difference is between Low and High Precision and also the thought behind offering the Low Precision option in the first place if we explicitly note that "A wide range of issues may be caused by importing the file using Low Precision, try the High Precision option" in an error message.
These are the general differences between the two, for the record:
But I'll let you know what they come back with.
Regards,
Mike
Hi Sam,
Thanks for your response. Glad to hear 9.6 is working and that this works with High Precision enabled! Based on the nature of the errors specified, it looks like it's expected t hat there may be issues while using Low Precision, but let me get with the dev team to try and determine with more specificity what the difference is between Low and High Precision and also the thought behind offering the Low Precision option in the first place if we explicitly note that "A wide range of issues may be caused by importing the file using Low Precision, try the High Precision option" in an error message.
These are the general differences between the two, for the record:
But I'll let you know what they come back with.
Regards,
Mike
Hi Sam,
Here's some feedback on your queries:
The Java error is because the Java version of the driver is higher than used by Yellowfin:
Error: java.lang.UnsupportedClassVersionError: com/microsoft/sqlserver/jdbc/SQLServerPreparedStatement has been compiled by a more recent version of the Java Runtime (class file version 53.0), this version of the Java Runtime only recognizes class file versions up to 52.0 (unable to load class [com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement]).
They need to use a higher version of Java to run Yellowfin, or replace their SQL Server driver with one that is compatible with the version of Yellowfin they are using.
The errors with precision make sense.
"java.lang.NumberFormatException
Please address the issue and try importing again. A wide range of issues may be caused by importing the file using Low Precision, try the High Precision option."
Low Precision means that it only samples a small subset of the entire file, to determine the data types of the columns. High Precision means that is scans all the values in the file.
The error is occurring because it cannot convert a value in a column to a numeric value. If high-precision is used, it would have found this non-numeric value, and made the column a Text column rather than a Numeric column. As the column is now a Text column, it won't be attempting the numeric conversion which is cause the error with the low-precision import.
This all comes down to the clients CSV data being dodgy.. If there is a column that holds a number in a column, but they have one row that has text, then Low Precision will miss it, and it will fail. Solution: Make sure the data is not dodgy, or use high-precision scans.
One problem with the high-precision scan (and the presence of dodgy data), is that it might be creating a column as text, when they are assuming that it should be a number.. This might stop the column from being used as a metric, and it will stop them from using the data in charts etc.
Hopefully this clears things up, but please let me know if you have any other questions on this.
Regards,
Mike
Hi Sam,
Here's some feedback on your queries:
The Java error is because the Java version of the driver is higher than used by Yellowfin:
Error: java.lang.UnsupportedClassVersionError: com/microsoft/sqlserver/jdbc/SQLServerPreparedStatement has been compiled by a more recent version of the Java Runtime (class file version 53.0), this version of the Java Runtime only recognizes class file versions up to 52.0 (unable to load class [com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement]).
They need to use a higher version of Java to run Yellowfin, or replace their SQL Server driver with one that is compatible with the version of Yellowfin they are using.
The errors with precision make sense.
"java.lang.NumberFormatException
Please address the issue and try importing again. A wide range of issues may be caused by importing the file using Low Precision, try the High Precision option."
Low Precision means that it only samples a small subset of the entire file, to determine the data types of the columns. High Precision means that is scans all the values in the file.
The error is occurring because it cannot convert a value in a column to a numeric value. If high-precision is used, it would have found this non-numeric value, and made the column a Text column rather than a Numeric column. As the column is now a Text column, it won't be attempting the numeric conversion which is cause the error with the low-precision import.
This all comes down to the clients CSV data being dodgy.. If there is a column that holds a number in a column, but they have one row that has text, then Low Precision will miss it, and it will fail. Solution: Make sure the data is not dodgy, or use high-precision scans.
One problem with the high-precision scan (and the presence of dodgy data), is that it might be creating a column as text, when they are assuming that it should be a number.. This might stop the column from being used as a metric, and it will stop them from using the data in charts etc.
Hopefully this clears things up, but please let me know if you have any other questions on this.
Regards,
Mike
Replies have been locked on this page!