wc -l

Bruce Ediger

2024-05-27 (Last Modified: 2024-05-29)

Unix, and now Linux, have included a wc command for a very long time. Most explanations of its use are misleading fluff and garbage, and do not give you an appreciation of its true value.

The canonical example is “what size is a file?”

$ wc some_file.txt
  4173  18568 119709 some_file.txt

That’s 4173 line, 18568 words and 119709 bytes in the file. It’s also single purpose. Why bother with a special command to tell you less than ls -lt some_file.txt will tell you?

The line count function is where all the benefits lie.

$ grep bongrips4jesus.com massive_data_file.csv | cut -d, -f1,2 | awk ... > data1.csv
$ grep bongrips4jesus .commassive_data_file.csv | cut -d, -f1,2,5 | awk ... > data2.csv
$ wc -l data1.csv data2.csv
 1741 data1.csv
   54 data2.csv
 1795 total

Something went very wrong in the pipeline after you modified the cut command.

The underlying idea is that a count of lines is of interest, because a count of bytes is meaningless. You know in advance the second pipeline will have different bytes because there’s an extra column in the data. You believe that both data results should have the same number of lines. This short, quick, check of your assumption shows your belief is incorrect.

The idea of count of lines is just semantics. The pragmatics of wc -l is: it can count data that appear in a pipeline.

You do keep your data in human-readable text format, don’t you?

The use of wc -l is to answer questions like “how many $X do I have?”

$ grep '^[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]:' /etc/hosts | wc -l
2

I have 2 IPv6 addressable machine names in my laptop’s hosts file.

$ jq -r '.Dhcp4.subnet4[].reservations[]."ip-address"' /etc/kea/kea-dhcp4.conf | sort | uniq -c | awk '$1 > 1 {print $0}' | wc -l
0

I have no duplicate IPv4 addresses in my DHCP reservations.

The hard part of this pipeline is getting the jq extraction correct. Once you’ve extracted the correct pieces of data, finding unique IPv4 addresses and counting them is very easy. I think this indicates that JSON is a format that requires some Computer Science Parsing, but once you’ve got text representations of IP addresses one per line, you do not need parsing.