Table of Contents
If you work with text files on Linux, you may have encountered the awk command. Awk is a powerful tool for processing and manipulating text data. In this blog post, we will explain what awk is, how it works, and some examples of how to use it.
If you are new to programming, I suggest you take a look at this thread. linux-bash-1
What is awk?
Awk is a programming language that was designed for text processing. It can perform various operations on text files, such as filtering, transforming, extracting, and reporting. Awk stands for Aho, Weinberger, and Kernighan, the names of its original authors.
How does awk work?
Awk works by reading one line of input at a time and applying a set of rules to it. Each rule consists of a pattern and an action. The pattern is a condition that determines whether the action should be executed or not. The action is a block of code that performs some operation on the input line.
For example, suppose we have a file called students.txt that contains the names and grades of some students:
Alice 90
Bob 80
Charlia 75
David 85
Eve 95
awk '$2 > 80 {print $1}' students.txt
The output will be:
Alice
David
Eve
The awk command consists of three parts: the awk program name, the awk script enclosed in single quotes, and the input file name. The awk script has one rule: ‘$2 > 80 {print $1}’. The pattern ‘$2 > 80’ means that the second field (separated by whitespace) of the input line should be greater than 80. The action ‘{print $1}’ means that the first field of the input line should be printed.
Here are some basic examples of how to use awk:
Print a specific field in a file:
awk '{print $2}' filename
This will print the second field in each line of the file named filename
.
Print a specific range of fields:
awk '{print $2, $4}' filename
This will print the second and fourth fields in each line of the file.
Print only lines that match a pattern:
awk '/pattern/ {print}' filename
This will print all lines that contain the word “pattern” in the file.
Count the number of lines in a file:
awk 'END {print NR}' filename
This will print the total number of lines in the file.
Calculate the average of a column of numbers:
awk '{sum += $3} END {print sum/NR}' filename
This will calculate the average of the third column in the file.
Combine awk with other commands:
ps -ef | awk '$1 == "root" {print}'
This will print all processes owned by the root user.
Awk has many built-in variables and functions that can be used in the patterns and actions. For example, NF is a variable that holds the number of fields in the current input line, and length() is a function that returns the length of a string. We can use them to print only the lines that have three fields and whose first field has more than four characters:
awk 'NF == 3 && length($1) > 4 {print}' students.txt
The output will be:
Charlie 75
Awk also allows you to define your own variables and functions. For example, we can define a variable called total to store the sum of all grades, and a function called average to calculate the average grade:
awk 'BEGIN {total = 0} {total += $2} END {average = total / NR; print "Average grade:", average}' students.txt
The output will be:
Average grade: 85
The BEGIN and END blocks are special blocks that are executed before and after processing the input file, respectively. NR is another built-in variable that holds the number of records (lines) read so far.
These are just some basic examples of how to use awk on Linux. Awk is a very versatile and expressive language that can handle complex text processing tasks. To learn more about awk, you can check out its manual page or online tutorials.