CSV Import: numbers stored as text

Peter Flinn shared this question 2 years ago
Answered

When using csv as a data source, the number fields come in as text. The function to convert these to number results in an empty column. How do I ensure that these fields are properly formatted when importing?

Comments (10)

photo
1

Hi Peter,


Thank you for getting in touch. In order to understand exactly what is going wrong, it may be best if I try and replicate this issue from my end.

  • Could you please send me a copy of CSV you are trying to import?
  • What version/build are of Yellowfin are you using?


Typically, if Yellowfin is interpreting the field to be a text field, it is because there is a text object somewhere within the first 1000 rows. A second possibility is that the source of your CSV has enclosed your numeric fields in quotation.


If you can provide me with the CSV in question, I will be able to provide more case-specific suggestions.


Regards,

Nathan

photo
1

Unfortunately the information is confidential, so i can't send it. Its form an Excel 2016 spreadsheet exported as a csv, and I'm using 7.3. The sheet itself has the columns formatted as currency, so I suspect that I will need to go through them and format the columns as number then export as csv. Just interested to know if this experience is shared by other users, and what their solution was.

photo
1

Indeed, the currency fields were all enclosed in quotes as part of the Excel export.

photo
1

Hi Peter,


Once you removed the quotes, were the columns importing as expected? Unfortunately, Excel will often want to throw its own formatting into things, so it is not surprising that a bit of editing was required. (Financial records containing dollar signs and commas is another common one)


Please let me know if you have any additional questions on this issue.


Regards,

Nathan

photo
1

OK I've reached a partial solution.


If you export the sheet to csv directly, currency is formatted as text - its not a solution to try to remove the quotes, that would take days. OTOH, if you export to a new sheet THEN save as csv, the dates are formatted as numbers, which then can't be imported into YF as dates. How hard is this, I said to myself?


Solution is to format all currency as numbers in the original excel workbook, then save directly as csv. At least that way the dates are preserved as well as numbers. I'm a little surprised this hasn't been raised elsewhere already, to be honest.

photo
1

Hi Peter,

Where is the data/CSV originally coming from?

For the reasons that you have described, it is typically best to avoid Excel as an intermediary between the source of your data and its final storage location.

While Yellowfin does provide some functionality to adjust how fields are interpreted (you can convert any string into a date through the view level date converter), it is designed to work on top of databases where the data types (integer, varchar, decimal etc) are pre-defined, and the data itself has to match its defined type exactly in order to even be inserted into the database.

Personally, when cleaning, organizing, and formatting data for import into Yellowfin I will use Python scripts. For example, the following code removes all quotations from a CSV file, and takes less than a second to run. (Python also includes several data science libraries designed to facilitate operations such as matrix/vector algebra, similar to R)


  1. f=open("mydata.csv","r")
  2. g=open("mydata_edited.csv","w")
  3. for line in f:
  4. line=line.replace('"','')
  5. g.write(line)

One other note:

The date format that excel will naturally export to CSV can easily be converted into a Yellowfin date field:

eead9e4ac1cbae14a41c099fd0494a61be96f670507013126d8b6cba60409ffc

If you can provide me a few sample rows of your CSV, with made up values as to not expose confidential information, I can provide more specific suggestions!

Regards,

Nathan

photo
2

Looking at the array of data sources we have (all Excel workbooks), I have since decided to use a SQL server as you suggest. It's just too fiddly using csv files, too time consuming. Just though I'd post my discovery in case its of interest to others.


I appreciate your suggestions, too - thanks for those. Just looking at the approach I will take as regards the SQL server for our data store (and in the future, for clients) - probably a separate instance for this.

photo
1

still fiddly

photo
1

Hi Jim,

If this is something of concern could you please create a new ticket for this?

Regards,

Nathan

photo
photo
1

Hi Peter,


Thanks for the input, I will keep this in mind in case other clients run into a similar issue.


Employing SQL Server sounds like a good solution though. Please let me know if you run into any problems or have any additional questions!


Regards,

Nathan