Examples of the unix join command

Given the files:

::::::::::::::

1.txt

::::::::::::::

1 abc

2 lmn

3 pqr

::::::::::::::

2.txt

::::::::::::::

1 abc

3 lmn

9 opq

if we type:

join 1.txt 2.txt

We get

1 abc abc

3 pqr lmn

since the two files are trying to match on the 1st column, and both have a 1 and a 3 in that column. To tell join to use the second column to join with, we type:

join -1 2 -2 2 1.txt 2.txt (where "-1 2" stands for the 2nd filed of the 1st file, and "-2 2" stands for the 2nd field of the 2nd file) we get:

abc 1 1

lmn 2 3

which are just those fields that match in both files. This is refered to as an inner join. Inner joins look for rows that match rows in the other table. The problem with inner joins is that only rows that match between tables are returned.

If we type:

join -a1 -1 2 -2 2 1.txt 2.txt (where "-a1" says include all the records from the first file) we get:

abc 1 1

lmn 2 3

pqr 3

this is missing a number for pqr for the second file (since there is no pqr in that file) and is missing "9 opq" from the second file. This is an example of a left outer join, called such because it includes all of the rows from the left (or first) file specified.

If we type:

join -a1 -a2 -1 2 -2 2 1.txt 2.txt (adding "-a2" to tell join to also include all records from the second file) we get:

abc 1 1

lmn 2 3

opq 9

pqr 3

Which has all of the records. This is an example of a full outer join since it has all of the rows from both files. (Missing from these examples is the case where we had the -a2 without the -a1. That would have produced a right outer join which contained all of the records from the second (right) file and only those rows from the first (left) file that matched).

While this example has all of the rows from both files, we still have a problem since we can not tell which file the count is for on the last two records.

We now need to format the output using the "-o" command. The options for "-o" are as follows:

A "0" (that.s a zero) means display the join field

A number in the format of X.Y means to display the Y field from the X file (ex 2.1 means display the first field from the second file).

-e "0" says to replace any missing data fields with whatever is in-between the quotes (in this case a zero).

So if we type:

join -a1 -a2 -1 2 -2 2 -o 0 1.1 2.1 -e "0" 1.txt 2.txt

we get:

abc 1 1

lmn 2 3

opq 0 9

pqr 3 0

which has all of the records, and fills in the zeros for us.

Return to Ira's Home Page.