Skip to main content

Load fixed width files

To load fixed width files, you need to know the width of each column. The example below is an example of how a fixed width file may be used to represent orders.


00001    62024-02-05T21:19:15.454ZCancelled
00002 72024-02-05T21:19:15.454ZCancelled
00003 82024-02-05T21:19:15.454ZDelivered

In the file above the columns are as follows:

  • order_id: 5 characters
  • customer_id: 5 characters
  • timestamp: 24 characters
  • status: 10 characters

Infer schema

The width of each column is fixed, but the columns are not separated by a delimiter. We just need to provide the infer-schema command with a file that contains a single record where each field is placed on a separate line prefixed by its name followed by a colon.

to infer the schema of the fixed width file above, we can submit the following file to the infer_schema command:

order_id:00001
customer_id: 6
timestamp:2024-02-05T21:19:15.454Z
status:Cancelled

After calling the infer-schema command below, the schema will be inferred and printed to the console.

$ starlake infer-schema -inputPath /my/path/fixed_width_file.txt --format FIXED --outputDir $SL_ROOT/metadata/load/starbake

The schema will be saved in the directory specified by the outputDir parameter. The starlake YAML schema will look like the one below:

load:
metadata:
format: FIXED
writeStrategy:
type: APPEND
...
attributes:
- name: order_id
type: string
position: # first 5 characters
first: 0
last: 4
- name: customer_id
type: string
position: # next 5 characters
first: 5
last: 9
- name: timestamp
type: string
position: # next 24 characters
first: 10
last: 33
- name: status
type: string
position: # next 10 characters
first: 34
last: 43

Load data

After inferring the schema, we can use the load command to load the data into the datawarehouse.

$ starlake load