Define Errors Out Of Existence

本章的重点:

  • why exceptions contribute disproportionately to complexity
  • it shows how to simplify exception handling.
  • Goals: to reduce the number of places where exceptions must be handled

Exception handling is one of the worst sources of complexity in software systems.

异常处理是系统复杂度的来源之一

Code that deals with special conditions is inherently harder to write than code that deals with normal cases, and developers often define exceptions without considering how they will be handled.

inherently: 固有地,内在地
用代码处理异常条件本身就比处理正常的场景更困难;与此同时,开发还要经常定义没有考虑会怎么处理的异常。
不负责任的使用异常,让系统变得更加的复杂。

The key overall lesson from this chapter is to reduce the number of places where exceptions must be handled;
in many cases the semantics of operations can be modified so that the normal behavior handles all situations and there is no exceptional condition to report

一切的核心是减少必须处理异常的地方

第二句话,前半段是理解的:操作的语义是可以修改的
后半句未能很好理解。没有理解是因为方法语义可变导致了需要代码处理所有的场景,还是通过改变方法语义来支持所有场景的处理。
留个疑问。
20220102:主要是希望通过改变语义的方式来减少异常场景。对于特殊场景,很难用一个语义全部覆盖掉。无法通过修改语义的方式去除的错误场景,可以通过隐藏和聚合的方式,将处理异常的地方减少到最小,从而降低系统复杂度。
主要是要多思考,是否可以通过变通一下,将可能会出现的错误定义在语义逻辑之外。

10.1 Why exceptions add complexity

exception - any uncommon condition that alters the normal flow of control in a program.
Many programming laguages include a formal exception machanism that allows exceptions to be thrown by lower-level code and caught by enclosing code.
However, exceptions can occur even without using a formal exception reporting mechanism, such as when a method returns a special value indicating that it didn't complete its normal behavior.

圈定了exception的范围,并不只是抛出的才是异常,如果程序没有按照流程走完也是一种异常

Encounter exceptions

  • A caller may provide bad arguments or configuration information.
  • An invoked method may not be able to complete a requested operation.
  • In a distributed system, network packets may be lost or deplayed, servers may not respond in a timely fashion, or peers may communicate in unexpected ways.
  • The code may detect bugs, internal inconsistencies, or situations it is not prepared to handle
参数异常、不能完成请求、丢包、系统不可访问、代码有bug等等
timely: 及时地
peer: 端对端
detect: 检测出,查明
inconsistency: 易变,不一致性

Large systems have to deal with many exceptional conditions, particularly if they are distributed or need to be fault-tolerant. Exception handling can account for a significant fraction of all the code in a system.

异常处理占了所有代码的很大的比例。

Exception handling is more complicated

Exception handling code is inherently more difficult to write than normal-case code.
It usually means that something didn't work as expected.
When an exception occurs, the programmer can deal with it in two ways, each of which can be complicated.

Two ways to deal with

The first approach is to move forward and complete the work in progress in spite of the exception.
The second approach is to abort the operation in progress and report the exception upwards.
aborting can be complicated because the exception may have occurred at a point where system state is inconsistent (a data structure might have been partially initialized); the exception handling code must restore consistency, such as by unwinding any changes made before the exception occurred.

异常处理主要是两种,一种是继续执行或者重试,另一种是中止程序,并向上抛出异常,但为了保证数据一致性增加了复杂度

exception handling code creates opportunites for more exceptions.

  • Consider the case of resending a lost network packet.
  • consider the case of recovering lost data from a redundant copy: what if the redundant copy has also been lost.
  • handled by aborting the operation in progress
方式一:重发失败的消息,这样会导致接受方会接收到重复的请求,接收方需要有新的异常场景需要处理。
方式二:通过冗余副本恢复丢失的数据,但是冗余副本也丢失了呢?
恢复时出现的第二个异常通常更加隐晦和复杂。
方式三:终止操作,然后必须向调用方抛出其他的异常。

To prevent an unending cascade of exceptions, the developer must eventually find a way to handle exceptions without introducing more exceptions.

language support

Larguage support for exceptions tends to be verbose and clunky, which makes exception handling code hard to read.

verbose: 冗长的,啰嗦的
clunky: 沉重的,笨重的

例如Java中的对象序列化与反序列化,通过try(){}catch{}的异常模板,导致产生了比正常操作长的代码。同时也很难将异常的产生与对应的代码明确的关联起来。
当然,有一个替代的方法,将代码拆分,当到当杜的try块中。对每一行会产生异常的代码进行try
这样让异常产生更加清晰,但是try块会让代码的逻辑变得支离破碎,不易于阅读。另外,一些异常处理的代码在多个try块出现重复的情况。

It's difficult to ensure that exception handling code really works.
such as I/O errors, can't easily be generated in a test environment, so it's hard to test the code that handles them.

code that hasn't been executed doesn't work

没有被执行过的代码都是不工作的。

A recent study found that more than 90% of catastrophic failures in distributed data-intensive systems were caused by incorrect error handling.
When exception handling code fails, it's difficult to debug the problem, since it occurs so infrequently.

可见异常处理是多么重要。

10.2 Too many exceptions

Programmers exacerbate the problems related to exception handling by defining unnecessary exceptions.

exacerbate: 使恶化,使加剧

Most programmers are taught that it's important to detect and report errors;

他们经常把这理解成“声明越多的错误,越好”。这导致一种过度防御的方式,任何看起来有一丝可疑的地方都用异常的方式拒绝。这导致没有必要的异常的扩散,从而增加了系统的复杂性。

It's tempting to use exceptions to avoid dealing with difficult situations: rather than figuring out a clean way to handle it, just throw an exception and punt the problem to the caller.
Some might argue that this approach empowers callers, since it allows wach caller to handle the exception in a different way.
if you are having trouble figuring out what to do for the particular situation, there's a good chance that the caller won't know what to do either.

在日常开发中,确实有这样的想法,通过异常声明,将不属于流程中的异常问题抛出,给上层调用方使用。
虽然这样可以让调用方自己决定异常处理的方式,但是无形之中增加了系统的复杂度。因为它只是把问题传递给了别人。而不是解决了问题。
如果你也不知道怎么处理特殊场景,那么调用方也可能不知道该做什么。

Generating an exception is a situation like this just passes the problem to someone else and adds to the system's complexity.

单方向抛出异常的弊端

过多的异常会让类变得没有深度

The exceptions thrown by a class are part of its interface; **classes with lots of exceptions have complex interfaces, and they are shallower than classes with fewer exceptions. **

从class中抛出的异常时接口的一部分。
当class有很多的异常需要复杂的接口。同时它们会比更少异常的class更浅。

It propagate up through several stack levels before being caught, so it affects not just the method's caller, but potentially also higher-level callers (and their interfaces).

一个抛出的异常影响的不止是调用方,还可能会影响更上层的调用方。
proapgate: 传播
potentially: 潜在地,可能地

Throwing exceptions is easy; handling them is hard
the complexity of exceptions comes from the exception handling code.
The best way to reduce the complexity damage caused by exception handling is to reduce the number of places where exceptions have to be handled.

减少异常的复杂度的最好地方式是减少处理异常的地方

10.3 Define errors out of existence

最开始以为是定义不存在的错误,感觉十分违背常理,既然异常已经不存了,该如何定义。随着阅读内容,发觉不是这么定义的,而是要将逻辑定义成没有异常。
例如unset命令的含义定义成删除某个变量

throwing an error when unset is asked to delte an unknown variable.

变更一下命令的定义
确保变量不复存在:

it is perfectly natural for unset to be invoked with the name of a variable that doesn't exist.
it should have simply returned without doing anything.

这样就没有异常场景需要抛出了。

10.4 Example: file deletion in Windows

Windows

The Windows operating system does note permit a file to be deleted if it is open in a process.

为了删除文件,使用者需要找到已经打开了文件的程序,然后关闭它。或者直接重启电脑,然后删除文件。。

Unix

if a file is open when it is deleted, Unix does not delete the file immediately, Instean, id marks the file for deletion, then the delete operation returns successfully.

文件被移出了当前的目录,所以其他程序不能打开老的文件。同时可以创建新的相同名字的文件。只是老文件的数据还存在。已经打开过文件的程序已经可以正常读写。一旦所有访问文件的程序关闭,它的数据会被释放。

Defines away two errors

The Unix approach defines away two different kinds of errors.

First, the delete operation no longer returns an error if the file is currently in use; the delete succeeds, and the file will eventually be deleted.

Second, deleting a file that's in use does not create exceptions for the processes using the file.

One possible approach to this problem would have been to delete the file immediately and mark all of the opens of the file to disable them; any attempts by other processes to read or write the deleted file would fail.
This approach would create new errors for those processes to handle.

10.5 Example: Java substring method

现象:

if either index is outside the range of the string, then substring throws IndexOutOfBoundsException.

改进:

The Java substring method would be easier to use if it performed this adjustment automatically, so that it implemented the following API: "returns the characters of the string (if any) with index greater than or equal to beginIndex and less than endIndex."
it defines the IndexOutOfBoundsException exception out of existence.

Many other llanguages have taken the error-free approach; for example, Python returns an empty result for out-of-range list slices.

error-free approach vs error-full approach

people sometimes counter that throwing errors will catch bugs; if errors are defined out of existence, won't that result in buggier software?
Perhaps this is why the Java developers decided that substring should throw exceptions.
The error-full approach may catch some bugs, but it also increases complexity, which results in other bugs.
Must write additional code to avoid or ignore the errors, and this increases the likelihood of bugs; or, they may forget to write the additional code, in which case unexcepted errors may be thrown at runtime.

In contrast, defining errors out of existence simplifies APIs and it reduces the amount of code that must be written.

OVerall, the best way to reduce bugs is to make siftware simpler.

10.6 Mask exceptions

The second technique for reducing the number of places where exceptions must be handled is exception masking.

第二种减少异常需要处理的地方的方式是隐藏异常。

With this approach, an exceptional condition is detected and handled at a low level in the system, so that higher levels of software need not be aware of the condition.

在这个方法下,可以在系统的低级别检测和处理异常情况,因此更高级别的软件不需要知道这种情况

Example
TCP传输协议

In a network transport protocol such as TCP, packets can be dropped for various reasons such as corruption and congestion.
TCP masks packet loss by resending lost packets within its implementation, so all data eventually gets through and clients are unarare of the dropped packets.

通过重发消息的方式,来隐藏消息因为错误或者拥挤导致丢失的问题。客户端并不需要感知未接收到的数据。

corruption: 错误
congestion: 拥挤
unaware: 不知道

NFS network file system

A more controversial example of masking .

If an NFS file server crashes or fails to respond for any reason, clients reissue their requests to the server over and over again until the problem is eventually resolved.

如果NSF文件服务端异常了,无论什么原因,客户端都是一遍遍地重试,直到成功。

有争议的点在于,许多用户认为可通过异常终止操作,而不是死机(挂起,一直重试)。

However, reporting exceptions would mak things worse, not better.
One possibility would be for the application to retry the file operation, but this ould still hang the application, and it's easier to perform the retry in one place in the NFS layer, rather than at every file system call in every application.
The other alternative is for applications to abort and return errors to their callers.
It's unlikely that the callers would know what to do either, so they would abort as well, resulting in a collapse of the user's working environment.

代替方式一:当应用程式失去文件的访问权限,能做的就是在应用程序的层面去不断重试连接,这依旧会让应用程序卡主
代替方式二:将错误消息抛出来,但是调用方也不知道该如何处理这个异常,最后可能导致用户系统崩溃。对于用户而言,当文件服务挂了,他们也不能完成自己的工作,能做的也是在文件服务修复之后重启所有应用。

Exception masking doesn't work in all situations, but it is a powerful tool in the situations where it works.
It results in deeper classes, since it reduces the class's interface (fewer exceptions for users to be aeare of) and adds functionality in the form of the code that masks the exception.
Exception masking is an example of pulling complexity downward.

10.7 Exception aggregation

The third technique for reducing complexity related to exceptions is exception aggregation.
To handle many exceptions with a single piece of code; rather than writing distinct handlers for many individual exceptions, handle them all in one place with a single handler.

将异常统一处理掉,而不是每个异常分开单独处理。

Example
在webAP调用的情况中,如何给不同的API接口做参数的校验。提供一个getParam的方法,用来从请求中获取指定fieldName的值。但是选择在每次调用getParam时,如果没有值抛出异常,还是将getParam放在一起,都调用后的地方进行统一处理。

Abatter approach is to aggregate the exceptions. Instead of catching the exceptions in the individual service methods, let them propagate up to the top-level dispach method for the Web server.

In each case, the error should result in an error response; the error differ only in the error message to include in the response
Thus, all conditions resulting in an error response can be handled with a single top-level exception handler.
The error message can be generated at the time the exception is thrown and included as a variable in the excepion record.

对于参数缺失校验的场景,每个场景之间的差别在于字段名称不一样,都是需要告诉用户某个参数缺失。所以可以在统一的地方生成错误消息,而不是分开独立处理。

The aggregation described in the preceding paragraph has good properties from the standpoint of encapsulation and information hiding.
The top-level exception handler encapsulates knowledge about how to generate error responses, but it knows nothing about specific errors; it just uses the rror mesage provided in the excaption.

从封装和信息隐藏的立场中看,聚合异常是好的。上层的异常处理器封装了生成错误的返回内容,但是不需要知道特殊的错误,只是使用了异常提供的错误的内容。

different subclases of the exception can be defined for different conditions.

可以通过不同的子类型来区分不同的异常场景

Exception aggregation works best if an exception propagates several levels up the stack before it is handled; this allows more exceptions from more methods to be handled in the same place.

异常聚合在一个异常可以向上传递时最有用。这可以让来自更多方法的更多异常在相同的地方被处理

masking usually works best if an exception is handled in a low-level method.

异常隐藏通常在异常在低层方法中处理掉时使用。这样的方式更多在基础库的方法中被使用,因为它会被很多其他可以调用到它的方法使用,否则会增加处理这一块异常的代码。

Masking and aggregation are similar in that both approaches positon an exception handler where it can catch the most exceptions, eliminating many handlers that would otherwise need to be created.

异常的隐藏和聚合相似之处在于都希望用在能最大程度捕获异常,减少需要写异常处理逻辑的地方。

one way of thinking about exception aggregation is that it replaces serveral special-purpose mechanisms, each tailored for a particular situation, with a single general-purpose mechanism that can handle multiple situations.

简单的来说异常聚合的作用是将多处特殊处理机制,针对每个特殊场景做了定制处理,都可以替换成通用的处理机制。

举了一个RAMCloud的例子。例子中说明RAMCloud对于崩溃恢复的处理。将异常的处理方式都统一成让服务器崩溃,然后进行恢复的方式。因为系统肯定需要服务器崩溃之后的恢复机制。这样就减少了异常处理的代码。
但觉得让服务器崩溃应该是一个下下策,而不是一个可以用来统一处理异常的方式。

10.8 Just crash?

The fourth technique for reducing complexity related to exception handling is to crash the application.

In most applications there will be certain errors that it's not worth trying to handle.
Typically, these errors are difficult to impossible to handle and don't occur very often. The simplest thing to do in response to these errors is to print diagnostic information and then abort the application

对于系统层面的异常,当然软件无法解决时,通过终止应用程序的方式表示当前系统异常。
但是对于现在的互联网应用而言,这种异常处理方式过于粗暴,没有特殊情况都不会考虑。

Example
"Out of memory" errors

Consider the malloc function in C, which returns NULL if it cannot allocate the desired block of memory. This is an unfortunate behavior, because it assumes that every single caller of malloc will check the return value and take appropriate action if there is no memory.

C语言中的malloc方法在没有内存的情况下返回null,需要每次调用后检查是否为空,增加了复杂度。

A better approach is to define a new method ckalloc, which calls malloc, checks the result, and aborts the application with an error message if memory is exhausted.
In newer languages such as C++ and Java, the new operator throws an exception if memory is exhausted.
Dynamically allocated memory is such a fundamental element of any modern application that it doesn't make sense for the application to continue if memory is exhausted; it's better to crash as soon as the error is detected.

捕获内存不足的异常没有多大的意义,因为异常处理程序也会尝试分配内存,最后还是失败。

Other examples of erros where crashing the application maskes sense.

if an I/O error occurs while reading or writing an open file (such as a disk hard error), or if a network socket cannot be opened, there's not much the applicaiton can do to recover, so aborting with a clear error message is a sensible approach.

这类异常都是不常发生,所以他们不太可能影响系统的可用性

Aborting with an error message is also appropriate if an application encounters an internal error such as an inconsistent data structure.
Conditions like this probably indicate bugs in the program.

Whether or not it is acceptable to crash on a particular error depends on the application

对于备份系统,对于IO异常,直接终止不太合适,而是需要系统使用备份的数据来恢复任何丢失的信息。这是备份系统的价值所在,虽然涉及的代码会复杂很多。

10.9 Design special cases out of existence

Special cases can result in code that is riddled with if statements, which make the code hard to understand and lead to bugs. Thus, special cases should be eliminated wherever possible.
The best way to do this is by designing the normal case in a way that automatically handles the special cases without any extra code.

在正常流程中通过重新定义语义,减少异常的存在,这对于特殊场景的异常也一样有效。

Example
对于文本编辑器,存在选择一段文字进行拷贝或者删除的功能。
如果实现成了通过状态变量来表示是否存在选择区域。因为从界面上看存在没有选择的情况,所以在实现中需要通过状态变量进行控制,从理解上很自然。但是这样就增加了需要检测是否有“没有选择区域”的条件,以及特殊处理方式。
如果将无选择区域定义成一个空的选择区域,它的开始和结束的位置时一样的。这样在实现上就不需要要区分是否存在选择区域的选项了。当选择后拷贝时,如果选择区域时空的,那么就复制了空的数据;当选择后删除时,如果选择区域时空的,也不需要特殊处理,只是删除开始和结束之间的内容。

10.10 Taking it too far

Defining away exceptions, or masking them inside a module, only masks sense if the exception information isn't needed outside the module.
In the rare situations where a caller cares about the special cases detected by the exceptions, there are other ways for it to get this information.

但是也可能导致这个想法太极端了。如果是一个网络沟通的模块,将网络异常都在内部处理掉,导致调用方无法正常获取异常的原因,如同没有问题一样继续处理。这让使用这个模块的应用程序无法知道时消息丢失了还是服务器异常。
所以在这种情况下,就算是异常会增加系统(接口)的复杂度,也需要将他们暴露出去。

with exceptions, as with many other areas in software design, you must determine what is important and what is not important.
Things that are not important should be hidden, and the more of them the better. but when something is important, it must be exposed.

需要区分什么是重要的和什么事不重要的。对于重要的内容需要暴露出去。对于不重要的内容,应该隐藏起来,而且是越多越好。

10.11 Conclusion

Special cases of any form make code harder to understand and increase the likelihood of bugs.

The best way to do this is by redefining semantics to eliminate error conditions
For exceptions that can't be defined away, you should look for opportunities to mask them at a low level, so their impact is limited, or aggregate several special-case handlers into a single more generic handler. Together, this techniques can have a significant impact on overall system complexity.

最好的方式是重新定义语义来减少错误的场景。
对于无法通过定义去除的异常,需要寻找机会将他们隐藏在底层,让他们的影响面受限。
或者聚合多个特殊场景的处理器到一个通用的处理器的地方。
总的来说,这些技术可以对整个系统的复杂性产生重大的影响。