How Strip Sync CSV Import Works (beta)

2010-03-24 20:00:00 -0400

Updated: 6/6/2012 This blog post is out of date, and the software referenced in it, Strip Sync, has been discontinued in favor of Strip for Windows and Strip for OS X. For more information on our data import format, please see this newer blog post.

As mentioned here previously, we’re on the cusp of starting the beta test of Strip Sync, our new desktop companion tool for Strip. Among the application’s features are CSV import and export. The intent of this article is to describe the format used on export and required on import. This format is subject to change, but it won’t be changing much. Any changes will be announced here on this blog and noted as updates at the end of this article.

The General Gist

Each row in the CSV corresponds to an Entry in your Strip database. There’s a field indicating what Category the Entry belongs in, the name of the Entry, and every other column is considered a Field. The import process creates a new Entry for each row in the CSV file after the header row. Note: Bulk updating is now supported on SSM, requires use of EntryID column. Bulk updating will be made available soon in Strip Sync for Windows (SSW).

This is what a sample import CSV might look like:

Credit Card,Financial,3759 876613 21001,,"exp:12/12
Insurance Policy,Financial,3759 876613 21001,,secret: name of your first pet? spot,secret,1-800-123-4567,4,,mscott,
jordie laforge,trekkers,,,,,,,nextgeneration|deep space nine,,
kirk,trekkers,,,,,,,star trek,,
patrick stewart,trekkers,,,,,,,star trek|next\|generation|voyager,,
Shopping Website,Personal,,,,secret,,,,,

Header Row Required

Just like the subtitle says, a header row is currently required to describe the data in the spreadsheet you’re importing for Strip.

Header Specification

  • One column must be named “Entry”, and this is case-insensitive.
  • One column must be named “Category”, and this is also case-insensitive.
  • The name of every other column (for now) is considered the name of a Field.
  • No columns should be named “Guid”… Guid is no longer a restricted name.
  • One column may be named “EntryID” on SSM, making the row an update to an existing record.

When Strip Sync reads the header row of your import file, it looks up each Field name in your database to see if there’s already a label/type associated with it. If not, a Field with this label is created for you in your database, with the default mode set to “text”, and you can simply change this setting to URL or whatever you like by editing your labels in Strip.

When a row contains an EntryID, Strip Sync looks up the Entry in your database and replaces it’s name, category, and fields using the data in the rest of the row.

Bulk Updating

To do bulk updates via CSV import, you need to get the unique identifiers for your entries! Simply use the export feature of Strip Sync to export a CSV file containing all records in your database, with their EntryIDs.

Row Processing

During import, after the header row has been read, Strip Sync begins cranking through all the other rows, creating new Entries using the data in each row (bulk update via CSV import is not supported yet, but we plan to support that soon). Here’s how it works:

  • Strip Sync looks at the Category field and does a case-sensitive lookup to find a matching Category in your database. If no match is found, a new Category with this name is created.
  • The Entry column is used as the name of the new Entry, as indicated above.
  • For each additional column in the row:
    • If the column is empty, it is ignored
    • If the column is not empty, a Field is created on the Entry, with a type/label corresponding to the column’s header name.
    • Field columns may contain multiple values, separated by the ‘pipe’ character, ‘|’. If multiple values are detected, multiple Fields will be created on the Entry, labeled according to the column’s header name.
    • If your Field needs to contain a pipe character as part of the Field value, you may escape it with a backslash character (i.e. ‘\|’).
  • If an Entry or Category column is blank, the entire import will be rolled back, and an error message will display detailing the problem and the line number where the problem was found.

Our import and CSV processing is based on the scanning technique and EBNF outlined by Matt Gallagher to fully support properly escaped CSV data.

If one were to extend that EBNF definition to take into account our use of | to separate multiple field values, we think it would look like this:

file = [header lineSeparator] record {lineSeparator record}
header = name {separator name}
record = field {separator field}
name = field
field = escaped | nonEscaped
escaped = doubleQuote {innerField | separator | lineSeparator | twoDoubleQuotes} doubleQuote
nonEscaped = innerField
doubleQuote = '"'
twoDoubleQuotes = '""'
separator = ','
lineSeparator = ('\r' | '\n') {'\r' | '\n'}
innerField = textData { innerFieldSeparator | textData }
innerFieldSeparator = '|'
textData = {characters up to the next double quote character, un-escaped innerFieldseparator, separator string, or lineSeparator}

Obviously, commentary and corrections are welcome (as well as bug reports).


blog comments powered by Disqus