0

Every 3 months, I have to load several fixed-width text files into our SQL Server 2005 database. They can have anywhere from 5 to 798 columns (seriously) and the columns change pretty regularly.

Here's an example of the fixed-width data file:

010001 2000040620000406001200004060220000406 010001 2000040620001116002 0020000406 010006 2007051620070516001200706220120070516 010006 2010071520100908002 0020100715 010006 2010071520100715001201007300120100715 010007 1993110919931109001199311230219931109 

With the text files, I also receive an "SAS" file that documents the columns and data types. Here's an example:

/************************************************************************ * Job Number: 119164 * Procedure Number: 6 * Record Length: 47 *************************************************************************/ INPUT @1 PRVDR_NUM $CHAR10. @11 SRVY_DT $CHAR8. /*YYYYMMDD*/ @19 CYC_VISIT_DT $CHAR8. @27 CYC_VISIT_NUM 3. @30 POC_SGN_DT $CHAR8. @38 LSC_BLDG_CNT 2. @40 SRVY_CMPLTN_DT $CHAR8. ; ***************************************************************; *** SAS LABEL Statement Follows ; ***************************************************************; LABEL PRVDR_NUM = 'CCN' SRVY_DT = 'Survey Date' CYC_VISIT_DT = 'Cycle Visit Date' CYC_VISIT_NUM = 'Cycle Visit Number' POC_SGN_DT = 'Administrator Signed POC Date' LSC_BLDG_CNT = 'Building Count' SRVY_CMPLTN_DT = 'Survey Completion Date' ; 

The person I get this file from calls it "SAS Input Code"

Is there a solution for loading the fixed-width file that can also use the SAS file to create the table? My current solution is too elaborate, and I really hope there's an alternative.

3
  • install python. install sas7bdat. from sas7bdat import SAS7BDAT with SAS7BDAT('\\path\\to\\file') as f: x = f.to_data_frame() x.to_excel('test.xlsx') Commented Jan 27, 2015 at 18:36
  • I don't know much about sas, but aren't the sas7bdat files a binary format? My source file is plain text. I'll check out that project regardless. Commented Jan 27, 2015 at 19:06
  • sorry. i didnt read well enough, i guess. good luck Commented Jan 29, 2015 at 4:00

1 Answer 1

1

I would write a python script that would,

  1. parse the columns descriptor file, get their widths and names
  2. iterate over all the data in the input file, splitting out the data into variables
  3. construct a sql query with these variables to the sql server with SQLAlchemy to be sent one by one or as a batch of rows or 1000 at a time

This script would take the columns descriptor file and the data file. The script would also have a database config file containing the database connection string or you can pass that as a third parameter.

I would write this for you, but I don't have time right now. This is a neat problem to solve. Python is perfect for this however. I wish my SAS problem was so easy.

2
  • I like this idea, and I wasn't aware of SQLAlchemy - thanks gxela! I talked with someone more familiar with SAS, and they said this "Input Code" could be used by SAS to directly import the data file. It seems like there's nothing out there to translate that into importing it into SQL. Oh, well. Commented Jan 29, 2015 at 14:21
  • Yeah, SAS is really behind on that part. We have been waiting on them to finish Hadoop SAS driver. They have it working half way, without any support for HDFS like they advertised. 6 months and 90k later, we are still waiting on them to finish development on the Hadoop driver. Commented Jan 29, 2015 at 17:32

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.