命令行编程：Shell Programming

Resources:

MIT CSAIL: The Missing Semester Lesson 2, https://missing-semester-cn.github.io/2020/shell-tools/
https://www.tutorialspoint.com/unix/shell_scripting.htm

TO BE IMPLEMENTED MORE DETAILED…

命令行变量的命名：

本节中，我们会介绍Unix Shell编程中变量的命名规则，变量本身作为一个字符串，我们在编程过程中会赋予其具体的值，这个具体的值可以是数字、文本、文件名、设备或任意其他种类的数据。
变量的本质，无非就是一个指向真实数据的指针，shell允许我们创建、赋值并删除变量。

合法的变量命名，只允许包含字母(a-z, A-Z)，数字(0-9)以及下划线。传统来说，Unix shell变量都应该将其命名为大写变量。不能用！*和来进行命名都是因为他们在shell中有特殊意义。

变量的定义：

1
2
3

VAR_NAME=variable_value
e.g:
NAME="Steve Junrong"

上面所定义的变量一般称为标量，标量一次只能包括一个值。shell变量的定义不需要任何格式限制符，系统会自由推断。注意，变量的定义不允许等号前后有空格，这个shell通过空格分割参数的特性相符合。

取变量的值：

提取变量的值需要使用操作符$，可以将变量本身看作指向数据的指针，而$就类似C指针的解引用过程。

#!/bin/sh

NAME="Zara Ali"
echo $NAME

结果为Zara Ali

要定义只读变量，可以通过只读命令来实现。在一个变量标记为只读后，后续任何修改它的操作都是非法的。

#!/bin/sh

NAME="Zara Ali"
readonly NAME
NAME="Qadiri"

1	/bin/sh: NAME: This variable is read only.

变量的删除：

可以通过unset指令来实现

1	unset VAR_NAME

就删除了VAR_NAME变量，后续我们也无法访问其数据了。注意，无法对只读变量进行unset删除。

变量的类型：

当shell运行时，有三种主要的变量：

局部变量：局部变量是在当前shell生成实例中的变量，它不能为其他通过该shell启动的程序所用或访问，且他们可以被命令提示符定义生成。

环境变量：环境变量对于shell的任意子进程都是可获取的，有许多程序都需要环境变量来正确执行，通常shell脚本只会定义运行所必须的环境变量。

Shell变量：Shell变量是一种特殊的，被shell设置的变量，为shell正确运行所必须的变量，这些变量有的是局部变量有的是环境变量。

数组：

Shell支持各种不同类型的数组变量，其定义与变量的定义一般无二。如可以这般定义数组：

NAME[0]="Zara"
NAME[1]="Qadir"
NAME[2]="Mahnaz"
NAME[3]="Ayan"
NAME[4]="Daisy"

若使用bash shell，也可以这样定义：

1	array_name=(value1 ... valuen)

数组元素的访问稍有不同，首先需要明确数组访问某一元素的index，并用方括号取到数组对应的指针。最后，外层是大花括号，框住整个数组元素指针后外面要跟一个$来解引用。

1	${array_name[index]}

注意，在方括号内将index设置为*或@可以直接访问整个数组的所有元素。 * 可以说是通配符了（globbing）

#!/bin/sh

NAME[0]="Zara"
NAME[1]="Qadir"
NAME[2]="Mahnaz"
NAME[3]="Ayan"
NAME[4]="Daisy"
echo "First Method: ${NAME[*]}"
echo "Second Method: ${NAME[@]}"

将会打印输出：

1
2
3

$./test.sh
First Method: Zara Qadir Mahnaz Ayan Daisy
Second Method: Zara Qadir Mahnaz Ayan Daisy

算数运算：

假设变量a和b分别是10和20

Operator					Description								Example
+ (Addition)	Adds values on either side of the operator	`expr $a + $b` will give 30
- (Subtraction)	Subtracts right hand operand from left hand operand	`expr $a - $b` will give -10
* (Multiplication)	Multiplies values on either side of the operator	`expr $a \* $b` will give 200
/ (Division)	Divides left hand operand by right hand operand	`expr $b / $a` will give 2
% (Modulus)	Divides left hand operand by right hand operand and returns remainder	`expr $b % $a` will give 0
= (Assignment)	Assigns right operand in left operand	a = $b would assign value of b into a
== (Equality)	Compares two numbers, if both are same then returns true.	[ $a == $b ] would return false.
!= (Not Equality)	Compares two numbers, if both are different then returns true.	[ $a != $b ] would return true.

注意，乘法的需要转义，只有 \ 才是乘法。

还需要注意，对于条件表达式，必须在方括号内，且方括号内的变量、操作符之间必须有空格。也即：

1 2	[ $a == $b ] 正确 [$a==$b] 错误

关系运算符：


Operator			Description								Example
-eq					Checks if the value of two operands are equal or not; if yes, then the condition becomes true.								[ $a -eq $b ] is not true.
-ne					Checks if the value of two operands are equal or not; if values are not equal, then the condition becomes true.				 [ $a -ne $b ] is true.
-gt					Checks if the value of left operand is greater than the value of right operand; if yes, then the condition becomes true.  [ $a -gt $b ] is not true.
-lt					Checks if the value of left operand is less than the value of right operand; if yes, then the condition becomes true.		 [ $a -lt $b ] is true.
-ge					Checks if the value of left operand is greater than or equal to the value of right operand; if yes, then the condition becomes true.	[ $a -ge $b ] is not true.
-le					Checks if the value of left operand is less than or equal to the value of right operand; if yes, then the condition becomes true.	[ $a -le $b ] is true.

上面的关系运算符，其方括号依旧需要隔出空白。

布尔运算符：


Operator	Description	Example
!	This is logical negation. This inverts a true condition into false and vice versa.	[ ! false ] is true.
-o	This is logical OR. If one of the operands is true, then the condition becomes true.	[ $a -lt 20 -o $b -gt 100 ] is true.
-a	This is logical AND. If both the operands are true, then the condition becomes true otherwise false.	[ $a -lt 20 -a $b -gt 100 ] is false.

特殊标识符变量：

Sr.No.	Variable & Description
1	
$0

The filename of the current script.

2	
$n

These variables correspond to the arguments with which a script was invoked. Here n is a positive decimal number corresponding to the position of an argument (the first argument is $1, the second argument is $2, and so on).

3	
$#

The number of arguments supplied to a script.

4	
$*

All the arguments are double quoted. If a script receives two arguments, $* is equivalent to $1 $2.

5	
$@

All the arguments are individually double quoted. If a script receives two arguments, $@ is equivalent to $1 $2.

6	
$?

The exit status of the last command executed.

7	
$$

The process number of the current shell. For shell scripts, this is the process ID under which they are executing.

8	
$!

The process number of the last background command.

逻辑分支语句：

本节我们会介绍Unix shell的逻辑决策语句。在撰写shell脚本的时候，也许会遇到需要在两个路径中做出先择的情况，所以我们需要构造条件表达式，并根据表达式的值来进行路径的选择，此时就涉及到逻辑分支语句的实现。

Unix shell一般支持两种条件语句，一种是if…else语句，一种是case…esac语句

对于if…else语句：

Unix shell中的if-else大体可以分三种

1
2
3

if...fi
if...else...fi
if...elif...else...fi

其中if…fi格式语法如下：

if [ expression ]
then
	Statement(s) to be executed if expression is true
fi

例：

#!/bin/sh
a=10
b=20

if [ $a == $b ]
then
	echo "a is equal to b"
fi

if [ $a != $b ]
then
	echo "a is not equal to b"
else
	echo "a is equal to b"
fi

特别的，在bash中进行比较时，尽可能使用双方括号[[ ]]，而不是但方括号，这样会降低犯错的概率，尽管其不能兼容sh。

对于单变量多分支的case…esac语句：

只有一种，对标c语言中的switch。

1	case...esac

语法格式，注意pattern1后面与)之间没有空格！

case word in
   pattern1)
      Statement(s) to be executed if pattern1 matches
      ;;
   pattern2)
      Statement(s) to be executed if pattern2 matches
      ;;
   pattern3)
      Statement(s) to be executed if pattern3 matches
      ;;
   *)
     Default condition to be executed
     ;;
esac

引例：

#!/bin/sh
FRUIT="kiwi"
case "$FRUIT" in
	"apple") echo "Apple pie is tasty"
	;;
	"banana") echo "I like banana nut bread"
	;;
	"kiwi") echo "New Zealand is famous for kiwi"
	;;
esac

一个很棒的应用：

#!/bin/sh

option="${1}" 
case ${option} in 
   -f) FILE="${2}" 
      echo "File name is $FILE"
      ;; 
   -d) DIR="${2}" 
      echo "Dir name is $DIR"
      ;; 
   *)  
      echo "`basename ${0}`:usage: [-f file] | [-d directory]" 
      exit 1 # Command to come out of the program with status 1
      ;; 
esac

$./test.sh
test.sh: usage: [ -f filename ] | [ -d directory ]
$ ./test.sh -f index.htm
$ vi test.sh
$ ./test.sh -f index.htm
File name is index.htm
$ ./test.sh -d unix
Dir name is unix
$

循环语句：

本节中，我们介绍Unix Shell的循环语句，循环作为很强大的编程工具，可以允许我们重复执行多条指令。在Unix中，一般有四种比较好用的循环语句。

while循环、for循环、until循环和select循环。

我们需要根据具体情况来对循环进行选择，例如while循环就需要条件表达式为真的限制，until循环则是一直执行循环直到条件表达式为真终止。

while loop:

具体的while循环的语法，while循环在command为真的时候会一直执行。

while command
do
	Statement(s) to be executed if command is true
done

一个while循环的应用实例：

#!/bin/sh
a=0

while [ $a -lt 10 ]
do
	echo $a
	a=`expr $a + 1`
done

从这里我们可以知道，while循环的条件表达式依旧需要方括号来包括，且其中的比较符号和变量也需要用空白隔开。然后注意到shell script中的循环体、分支结构的部分内容是类似python一样用缩进实现代码块的，不需要像c一样通过大括号来标记代码段。另外，也不需要在每一个命令语句后面加上分号。

对于赋值语句，使用类似a=b的方式，单等号实现赋值，且等号左右两边不能有空格，等号左边为需要赋值的变量，等号右边一定是一个变量或表达式。若要在右侧执行算术运算，需要用``来框着表达式，起头为expr，后面将对应内容输入进去。如这里是

1	a=`expr $a + 1`

for loop:

具体的for循环的语法：

for var in word1 word2 word3 ... wordN
do
	Statement(s) to be executed for every word
done

例子：

#!/bin/sh

for var in 0 1 2 3 4 5 6 7 8 9
do
   echo $var
done

接下来的例子，会打印所有在主目录下以.bash开头的文件或文件夹的名字，如下所示：

#!/bin/sh

for FILE in $HOME/.bash*
do
   echo $FILE
done

最后输出的结果可以是：

/root/.bash_history
/root/.bash_logout
/root/.bash_profile
/root/.bashrc

Until loop:

当我们需要执行一系列命令，直到条件表达式为真，则此时我们可以用until指令来实现。

until command
do
   Statement(s) to be executed until command is true
done

command是我们的条件表达式，在false的时候循环体不断执行。等价于do…while(!command)

#!/bin/sh

a=0

until [ ! $a -lt 10 ]
do
   echo $a
   a=`expr $a + 1`
done

Select loop:

选择循环为创建一系列选择项的目录提供了一种简便的方式，在我们需要要求用户从一系列选项中选出特定内容是十分有效的。

具体的语法为：

select var in word1 word2 ... wordN
do
   Statement(s) to be executed for every word.
done

var是变量的名字，且word之间用空格隔开。很少用，不多赘述。

循环控制：

可以用break来打破循环，break n可以标记嵌套循环中要打破的层号，从而获取退出的地点。

#!/bin/sh

a=0

while [ $a -lt 10 ]
do
   echo $a
   if [ $a -eq 5 ]
   then
      break
   fi
   a=`expr $a + 1`
done

#!/bin/sh

for var1 in 1 2 3
do
   for var2 in 0 5
   do
      if [ $var1 -eq 2 -a $var2 -eq 0 ]
      then
         break 2
      else
         echo "$var1 $var2"
      fi
   done
done

上面的break 2就代表，打破打破两层循环，会连着var1遍历的循环也打破，直接结束。因此输出结果为：

1
2

1 0
1 5

continue也是同理的，continue n标记了从哪个循环接着执行下去。

细节与琐碎的知识（摘自The Missing Semester Lesson 2 Notes）：

bash中为变量赋值不能中间加上空格，例如：
1
2
foo = bar (x)
foo=bar (√)
前者bash会将其看成，调用foo程序，并传入=和bar作为参数。
bash中的字符串通过单引号和双引号分隔符来定义，单引号’$foo’构成得分字符串为原意字符串，其中的变量不会被转义；而双引号”$foo”构成的字符串会讲变量值进行替换。
1
2
3
4
5
foo=bar
echo "$foo"
# 打印 bar
echo '$foo'
# 打印 $foo

关于bash中的特殊字符解析，https://www.tldp.org/LDP/abs/html/special-chars.html。常用的如下所示：

$0 脚本名，可以理解为第0个传入的参数，就是运行的脚本名称本身
$1 ~ $9 分别表示脚本传入的第1~9的参数，体现为用户在shell中输入的第1~9个参数。
$@ 表示所有参数
$# 参数个数
$? 前一个命令的返回值，为0表示成功运行为1表示运行失败
$$ 当前脚本的进程识别码pid
!! 完整的上一条命令，包括参数。一个常见的应用，若权限不足，则sudo !!一下即可。
$_ 上一条命令的最后一个参数，

与条件逻辑判断有关的运算有||和&&，他们分别对true/false也是短路运算的。
当您通过 $( CMD ) 这样的方式来执行CMD 这个命令时，它的输出结果会替换掉 $( CMD ) 。例如，如果执行 for file in $(ls) ，shell首先将调用ls ，然后遍历得到的这些返回值。还有一个冷门的类似特性是 进程替换（process substitution）， <( CMD ) 会执行 CMD 并将结果输出到一个临时文件中，并将 <( CMD ) 替换成临时文件名。这在我们希望返回值通过文件而不是STDIN传递时很有用。例如， diff <(ls foo) <(ls bar) 会显示文件夹 foo 和 bar 中文件的区别。

>> 操作符在bash中所作的是将文件末尾添加对应的内容，如：
echo "# foobar" >> "$file" 就是将# foobar 添加到变量file所对应的文件$file的最后一行上。注意这里file外面是双引号，其内容会被替换为file对应真正的文件名。
1
2
3
4
5
6
7
8
9
10
11

7. 在bash中进行比较时，尽量使用双方括号 `[[ ]]` 而不是单方括号 `[ ]`，这样会降低犯错的几率，尽管这样并不能兼容 `sh`。 更详细的说明参见[这里](http://mywiki.wooledge.org/BashFAQ/031)。

8. 通配符 - 当你想要利用通配符进行匹配时，你可以分别使用 `?` 和 `*` 来匹配一个或任意个字符。例如，对于文件`foo`, `foo1`, `foo2`, `foo10` 和 `bar`, `rm foo?`这条命令会删除`foo1` 和 `foo2` ，而`rm foo*` 则会删除除了`bar`之外的所有文件。

9. 花括号`{}` - 当你有一系列的指令，其中包含一段公共子串时，可以用花括号来自动展开这些命令。这在批量移动或转换文件时非常方便。

10. 注意，在shebang行中使用env命令是一种好的实践，它不会给出绝对地址的直接解释器，而是通过访问系统的环境变量来找到指定的解释器的地址。使用了env的shebang可以如下构造：

    ```shell
    #!/usr/bin/env python

在练习中出现的脚本，其中

#!/usr/bin/env bash
    
n=$(( RANDOM % 100 )) #这里的RANDOM是系统的环境变量，可以用来产生随机数
    
if [[ n -eq 42 ]]; then #恰好n=42
    echo "Something went wrong"
    >&2 echo "The error was using magic numbers" 
    exit 1
fi
    
echo "Everything went according to plan"

在shell编程中的一些特例与细节：

1
2
3

>	这是简单的一般的重定向，用于将stdout重定向到右边的输出流
2>  这是将stderr重定向到右边的输出流
>& filename	该操作同时将stdout和stderr重定向到filename的文件中

一个特例的解释：
1
>&2 echo "error"
他将2号文件描述符的文件copy到了1号文件描述符，也即重定向后，1号和2号文件被重定向绑定了，且都绑定了2号文件描述符原来对应的文件。
关于重定向，一份很有意思的文章：https://web.archive.org/web/20230315225157/https://wiki.bash-hackers.org/howto/redirection_tutorial
http://www.tldp.org/LDP/abs/html/io-redirection.html
https://unix.stackexchange.com/questions/159513/what-are-the-shells-control-and-redirection-operators

program &>> result.txt
等价于
program >> result.txt 2>&1

&: 在这里的意思是同时包括标准输出流1>和标准错误流2>
>> :双大于号，类似右移的格式，在bash script中是将内容加到文件末尾处

管道实际上所作的，是将标准输出连接到标准输入。xargs是build and execute command lines from standard output, 例如ls | xargs rm 就可以删除当前目录中的所有文件。
semicolon分号在Linux shell中所表现出来的效果会被当做一个命令分隔符，也即按下ENTER去执行一个命令。分号;意味着在命令行环境的foreground运行接下来的内容，而&意味着在命令行环境的background下运行该内容。
注意，定义脚本函数的时候，格式为：

1
2
3

pidwait(){
	...
}

脚本函数不需要定义传入参数，因为脚本函数的执行本身就需要用户键入参数来执行，$1,$2…$9就分别对应用户传入的参数，可以根据此方法来进行执行或引用。

当shell 编程中，我们将命令的执行作为控制条件的时候，不需要使用方括号。这是因为，在shell的if/while等需要进行条件判断的场所，一般的表达式需要加上方括号，而方括号在shell中实际上是test指令的代名词。if和while的条件判断部分实际上都跑了一个test指令，并通过检查指令的返回值是否为0（指令执行正确），1（指令执行错误）来进行相应的分支。而当我们将指令的执行作为判断条件的时候，就不用通过方括号再在外面跑一层test了，while会直接根据条件判断部分的命令返回值来检测命令是否正确执行。
一个检测进程是否存在的技巧，kill -0。因为kill加上-0选项后，就不会再接着发信号了，从而不会使进程被中断掉。此时kill -0若检测到进程就会成功退出，返回0的状态码，而若进程不存在则只会返回一个不为0的状态码。下面是一段摘自The Missing Semester Lesson5，课后练习的内容：

#!/bin/sh
pidwait(){
	while kill -0 $1
 	do
 		echo "waiting for process PID:$1\n"
 		sleep 10
 	done
}

在将命令的结果作为变量的内容时，通常用$符号解引用()内的命令，如：

1	files=$(ls -a $1 \| grep -E '.[^.]+' \| grep -v .git)

整体脚本如下所示：

#!/bin/bash
files=$(ls -a $1 | grep -E '.[^.]+' |grep -v .git)
# 去掉 ls -a 返回结果中的 ". .. .git"
for file in `echo $files`; do
    ln -s $1/$file ~/$file # 创建软链接
done

~ $ source autoconfig.sh 
# 执行脚本，为dotfiles中的配置文件创建在主目录 ~ 下的软链接

注意，在通过调用shell中一系列程序并将其输出作为变量结果时，要用var=$(proc1 … | proc2 … | proc3 …)这种方式来实现。

backticks(``)允许用户将一个shell命令输出到一个变量中，它会将命令在系统中运行，并返回输出来继续运行特殊的脚本逻辑。简而言之，backticks为两个命令之间建立了桥梁，也就是说第二个命令的行为可以依靠第一个命令的执行结果，这部分代码对于shell编程时很有效的。例如：
1
gedit `grep -l "Linuxhint.dev" *.txt`
上述代码就将grep的结果作为参数接着传给了gedit，再来看另一个例子：
1
2
3
#!/bin/sh
DATE=`date`
echo "You have accessed this script on $DATE"
上面的代码，DATE变量存储的，实际上是程序date的运行结果，也就是当前日期的一个字符串。接着，我们调用echo程序，并在echo程序内的字符串打印出了该变量的数值。