17.1 文字处理

如果您有C或C++的经验，那么最开始可能会对Java控制文本的能力感到怀疑。事实上，我们最害怕的就是速度特别慢，这可能妨碍我们创造能力的发挥。然而，Java对应的工具（特别是String类）具有很强的功能，就象本节的例子展示的那样（而且性能也有一定程度的提升）。

正如大家即将看到的那样，建立这些例子的目的都是为了解决本书编制过程中遇到的一些问题。但是，它们的能力并非仅止于此。通过简单的改造，即可让它们在其他场合大显身手。除此以外，它们还揭示出了本书以前没有强调过的一项Java特性。

17.1.1 提取代码列表

对于本书每一个完整的代码列表（不是代码段），大家无疑会注意到它们都用特殊的注释记号起始与结束（'//:'和'///:~'）。之所以要包括这种标志信息，是为了能将代码从本书自动提取到兼容的源码文件中。在我的前一本书里，我设计了一个系统，可将测试过的代码文件自动合并到书中。但对于这本书，我发现一种更简便的做法是一旦通过了最初的测试，就把代码粘贴到书中。而且由于很难第一次就编译通过，所以我在书的内部编辑代码。但如何提取并测试代码呢？这个程序就是关键。如果你打算解决一个文字处理的问题，那么它也很有利用价值。该例也演示了String类的许多特性。

我首先将整本书都以ASCII文本格式保存成一个独立的文件。CodePackager程序有两种运行模式（在usageString有相应的描述）：如果使用-p标志，程序就会检查一个包含了ASCII文本（即本书的内容）的一个输入文件。它会遍历这个文件，按照注释记号提取出代码，并用位于第一行的文件名来决定创建文件使用什么名字。除此以外，在需要将文件置入一个特殊目录的时候，它还会检查package语句（根据由package语句指定的路径选择）。

但这样还不够。程序还要对包（package）名进行跟踪，从而监视章内发生的变化。由于每一章使用的所有包都以c02，c03，c04等等起头，用于标记它们所属的是哪一章（除那些以com起头的以外，它们在对不同的章进行跟踪的时候会被忽略）——只要每一章的第一个代码列表包含了一个package，所以CodePackager程序能知道每一章发生的变化，并将后续的文件放到新的子目录里。每个文件提取出来时，都会置入一个SourceCodeFile对象，随后再将那个对象置入一个集合（后面还会详尽讲述这个过程）。这些SourceCodeFile对象可以简单地保存在文件中，那正是本项目的第二个用途。如果直接调用CodePackager，不添加-p标志，它就会将一个“打包”文件作为输入。那个文件随后会被提取（释放）进入单独的文件。所以-p标志的意思就是提取出来的文件已被“打包”（packed）进入这个单一的文件。

但为什么还要如此麻烦地使用打包文件呢？这是由于不同的计算机平台用不同的方式在文件里保存文本信息。其中最大的问题是换行字符的表示方法；当然，还有可能存在另一些问题。然而，Java有一种特殊类型的IO数据流——DataOutputStream——它可以保证“无论数据来自何种机器，只要使用一个DataInputStream收取这些数据，就可用本机正确的格式保存它们”。也就是说，Java负责控制与不同平台有关的所有细节，而这正是Java最具魅力的一点。所以-p标志能将所有东西都保存到单一的文件里，并采用通用的格式。用户可从Web下载这个文件以及Java程序，然后对这个文件运行CodePackager，同时不指定-p标志，文件便会释放到系统中正确的场所（亦可指定另一个子目录；否则就在当前目录创建子目录）。为确保不会留下与特定平台有关的格式，凡是需要描述一个文件或路径的时候，我们就使用File对象。除此以外，还有一项特别的安全措施：在每个子目录里都放入一个空文件；那个文件的名字指出在那个子目录里应找到多少个文件。

下面是完整的代码，后面会对它进行详细的说明：

//: CodePackager.java
// "Packs" and "unpacks" the code in "Thinking 
// in Java" for cross-platform distribution.
/* Commented so CodePackager sees it and starts
   a new chapter directory, but so you don't 
   have to worry about the directory where this
   program lives:
package c17;
*/
import java.util.*;
import java.io.*;

class Pr {
  static void error(String e) {
    System.err.println("ERROR: " + e);
    System.exit(1);
  }
}

class IO {
  static BufferedReader disOpen(File f) {
    BufferedReader in = null;
    try {
      in = new BufferedReader(
        new FileReader(f));
    } catch(IOException e) {
      Pr.error("could not open " + f);
    }
    return in;
  }
  static BufferedReader disOpen(String fname) {
    return disOpen(new File(fname));
  }
  static DataOutputStream dosOpen(File f) {
    DataOutputStream in = null;
    try {
      in = new DataOutputStream(
        new BufferedOutputStream(
          new FileOutputStream(f)));
    } catch(IOException e) {
      Pr.error("could not open " + f);
    }
    return in;
  }
  static DataOutputStream dosOpen(String fname) {
    return dosOpen(new File(fname));
  }
  static PrintWriter psOpen(File f) {
    PrintWriter in = null;
    try {
      in = new PrintWriter(
        new BufferedWriter(
          new FileWriter(f)));
    } catch(IOException e) {
      Pr.error("could not open " + f);
    }
    return in;
  }
  static PrintWriter psOpen(String fname) {
    return psOpen(new File(fname));
  }
  static void close(Writer os) {
    try {
      os.close();
    } catch(IOException e) {
      Pr.error("closing " + os);
    }
  }
  static void close(DataOutputStream os) {
    try {
      os.close();
    } catch(IOException e) {
      Pr.error("closing " + os);
    }
  }
  static void close(Reader os) {
    try {
      os.close();
    } catch(IOException e) {
      Pr.error("closing " + os);
    }
  }
}

class SourceCodeFile {
  public static final String 
    startMarker = "//:", // Start of source file
    endMarker = "} ///:~", // End of source
    endMarker2 = "}; ///:~", // C++ file end
    beginContinue = "} ///:Continued",
    endContinue = "///:Continuing",
    packMarker = "###", // Packed file header tag
    eol = // Line separator on current system
      System.getProperty("line.separator"),
    filesep = // System's file path separator
      System.getProperty("file.separator");
  public static String copyright = "";
  static {
    try {
      BufferedReader cr =
        new BufferedReader(
          new FileReader("Copyright.txt"));
      String crin;
      while((crin = cr.readLine()) != null)
        copyright += crin + "\n";
      cr.close();
    } catch(Exception e) {
      copyright = "";
    }
  }
  private String filename, dirname,
    contents = new String();
  private static String chapter = "c02";
  // The file name separator from the old system:
  public static String oldsep;
  public String toString() {
    return dirname + filesep + filename;
  }
  // Constructor for parsing from document file:
  public SourceCodeFile(String firstLine, 
      BufferedReader in) {
    dirname = chapter;
    // Skip past marker:
    filename = firstLine.substring(
        startMarker.length()).trim();
    // Find space that terminates file name:
    if(filename.indexOf(' ') != -1)
      filename = filename.substring(
          0, filename.indexOf(' '));
    System.out.println("found: " + filename);
    contents = firstLine + eol;
    if(copyright.length() != 0)
      contents += copyright + eol;
    String s;
    boolean foundEndMarker = false;
    try {
      while((s = in.readLine()) != null) {
        if(s.startsWith(startMarker))
          Pr.error("No end of file marker for " +
            filename);
        // For this program, no spaces before 
        // the "package" keyword are allowed
        // in the input source code:
        else if(s.startsWith("package")) {
          // Extract package name:
          String pdir = s.substring(
            s.indexOf(' ')).trim();
          pdir = pdir.substring(
            0, pdir.indexOf(';')).trim();
          // Capture the chapter from the package
          // ignoring the 'com' subdirectories:
          if(!pdir.startsWith("com")) {
            int firstDot = pdir.indexOf('.');
            if(firstDot != -1)
              chapter = 
                pdir.substring(0,firstDot);
            else
              chapter = pdir;
          }
          // Convert package name to path name:
          pdir = pdir.replace(
            '.', filesep.charAt(0));
          System.out.println("package " + pdir);
          dirname = pdir;
        }
        contents += s + eol;
        // Move past continuations:
        if(s.startsWith(beginContinue))
          while((s = in.readLine()) != null)
            if(s.startsWith(endContinue)) {
              contents += s + eol;
              break;
            }
        // Watch for end of code listing:
        if(s.startsWith(endMarker) ||
           s.startsWith(endMarker2)) {
          foundEndMarker = true;
          break;
        }
      }
      if(!foundEndMarker)
        Pr.error(
          "End marker not found before EOF");
      System.out.println("Chapter: " + chapter);
    } catch(IOException e) {
      Pr.error("Error reading line");
    }
  }
  // For recovering from a packed file:
  public SourceCodeFile(BufferedReader pFile) {
    try {
      String s = pFile.readLine();
      if(s == null) return;
      if(!s.startsWith(packMarker))
        Pr.error("Can't find " + packMarker
          + " in " + s);
      s = s.substring(
        packMarker.length()).trim();
      dirname = s.substring(0, s.indexOf('#'));
      filename = s.substring(s.indexOf('#') + 1);
      dirname = dirname.replace(
        oldsep.charAt(0), filesep.charAt(0));
      filename = filename.replace(
        oldsep.charAt(0), filesep.charAt(0));
      System.out.println("listing: " + dirname 
        + filesep + filename);
      while((s = pFile.readLine()) != null) {
        // Watch for end of code listing:
        if(s.startsWith(endMarker) ||
           s.startsWith(endMarker2)) {
          contents += s;
          break;
        }
        contents += s + eol;
      }
    } catch(IOException e) {
      System.err.println("Error reading line");
    }
  }
  public boolean hasFile() { 
    return filename != null; 
  }
  public String directory() { return dirname; }
  public String filename() { return filename; }
  public String contents() { return contents; }
  // To write to a packed file:
  public void writePacked(DataOutputStream out) {
    try {
      out.writeBytes(
        packMarker + dirname + "#" 
        + filename + eol);
      out.writeBytes(contents);
    } catch(IOException e) {
      Pr.error("writing " + dirname + 
        filesep + filename);
    }
  }
  // To generate the actual file:
  public void writeFile(String rootpath) {
    File path = new File(rootpath, dirname);
    path.mkdirs();
    PrintWriter p =
      IO.psOpen(new File(path, filename));
    p.print(contents);
    IO.close(p);
  }
}

class DirMap {
  private Hashtable t = new Hashtable();
  private String rootpath;
  DirMap() {
    rootpath = System.getProperty("user.dir");
  }
  DirMap(String alternateDir) {
    rootpath = alternateDir;
  }
  public void add(SourceCodeFile f){
    String path = f.directory();
    if(!t.containsKey(path))
      t.put(path, new Vector());
    ((Vector)t.get(path)).addElement(f);
  }
  public void writePackedFile(String fname) {
    DataOutputStream packed = IO.dosOpen(fname);
    try {
      packed.writeBytes("###Old Separator:" +
        SourceCodeFile.filesep + "###\n");
    } catch(IOException e) {
      Pr.error("Writing separator to " + fname);
    }
    Enumeration e = t.keys();
    while(e.hasMoreElements()) {
      String dir = (String)e.nextElement();
      System.out.println(
        "Writing directory " + dir);
      Vector v = (Vector)t.get(dir);
      for(int i = 0; i < v.size(); i++) {
        SourceCodeFile f = 
          (SourceCodeFile)v.elementAt(i);
        f.writePacked(packed);
      }
    }
    IO.close(packed);
  }
  // Write all the files in their directories:
  public void write() {
    Enumeration e = t.keys();
    while(e.hasMoreElements()) {
      String dir = (String)e.nextElement();
      Vector v = (Vector)t.get(dir);
      for(int i = 0; i < v.size(); i++) {
        SourceCodeFile f = 
          (SourceCodeFile)v.elementAt(i);
        f.writeFile(rootpath);
      }
      // Add file indicating file quantity
      // written to this directory as a check:
      IO.close(IO.dosOpen(
        new File(new File(rootpath, dir),
          Integer.toString(v.size())+".files")));
    }
  }
}

public class CodePackager {
  private static final String usageString =
  "usage: java CodePackager packedFileName" +
  "\nExtracts source code files from packed \n" +
  "version of Tjava.doc sources into " +
  "directories off current directory\n" +
  "java CodePackager packedFileName newDir\n" +
  "Extracts into directories off newDir\n" +
  "java CodePackager -p source.txt packedFile" +
  "\nCreates packed version of source files" +
  "\nfrom text version of Tjava.doc";
  private static void usage() {
    System.err.println(usageString);
    System.exit(1);
  }
  public static void main(String[] args) {
    if(args.length == 0) usage();
    if(args[0].equals("-p")) {
      if(args.length != 3)
        usage();
      createPackedFile(args);
    }
    else {
      if(args.length > 2)
        usage();
      extractPackedFile(args);
    }
  }
  private static String currentLine; 
  private static BufferedReader in;
  private static DirMap dm;
  private static void 
  createPackedFile(String[] args) {
    dm = new DirMap();
    in = IO.disOpen(args[1]);
    try {
      while((currentLine = in.readLine()) 
          != null) {
        if(currentLine.startsWith(
            SourceCodeFile.startMarker)) {
          dm.add(new SourceCodeFile(
                   currentLine, in));
        }
        else if(currentLine.startsWith(
            SourceCodeFile.endMarker))
          Pr.error("file has no start marker");
        // Else ignore the input line
      }
    } catch(IOException e) {
      Pr.error("Error reading " + args[1]);
    }
    IO.close(in);
    dm.writePackedFile(args[2]);
  }
  private static void 
  extractPackedFile(String[] args) {
    if(args.length == 2) // Alternate directory
      dm = new DirMap(args[1]);
    else // Current directory
      dm = new DirMap();
    in = IO.disOpen(args[0]);
    String s = null;
    try {
       s = in.readLine();
    } catch(IOException e) {
      Pr.error("Cannot read from " + in);
    }
    // Capture the separator used in the system
    // that packed the file:
    if(s.indexOf("###Old Separator:") != -1 ) {
      String oldsep = s.substring(
        "###Old Separator:".length());
      oldsep = oldsep.substring(
        0, oldsep. indexOf('#'));
      SourceCodeFile.oldsep = oldsep;
    }
    SourceCodeFile sf = new SourceCodeFile(in);
    while(sf.hasFile()) {
      dm.add(sf);
      sf = new SourceCodeFile(in);
    }
    dm.write();
  }
} ///:~

我们注意到package语句已经作为注释标志出来了。由于这是本章的第一个程序，所以package语句是必需的，用它告诉CodePackager已改换到另一章。但是把它放入包里却会成为一个问题。当我们创建一个包的时候，需要将结果程序同一个特定的目录结构联系在一起，这一做法对本书的大多数例子都是适用的。但在这里，CodePackager程序必须在一个专用的目录里编译和运行，所以package语句作为注释标记出去。但对CodePackager来说，它“看起来”依然象一个普通的package语句，因为程序还不是特别复杂，不能侦查到多行注释（没有必要做得这么复杂，这里只要求方便就行）。

头两个类是“支持／工具”类，作用是使程序剩余的部分在编写时更加连贯，也更便于阅读。第一个是Pr，它类似ANSI C的perror库，两者都能打印出一条错误提示消息（但同时也会退出程序）。第二个类将文件的创建过程封装在内，这个过程已在第10章介绍过了；大家已经知道，这样做很快就会变得非常累赘和麻烦。为解决这个问题，第10章提供的方案致力于新类的创建，但这儿的“静态”方法已经使用过了。在那些方法中，正常的异常会被捕获，并相应地进行处理。这些方法使剩余的代码显得更加清爽，更易阅读。

帮助解决问题的第一个类是SourceCodeFile（源码文件），它代表本书一个源码文件包含的所有信息（内容、文件名以及目录）。它同时还包含了一系列String常数，分别代表一个文件的开始与结束；在打包文件内使用的一个标记；当前系统的换行符；文件路径分隔符（注意要用System.getProperty()侦查本地版本是什么）；以及一大段版权声明，它是从下面这个Copyright.txt文件里提取出来的：

//////////////////////////////////////////////////
// Copyright (c) Bruce Eckel, 1998
// Source code file from the book "Thinking in Java"
// All rights reserved EXCEPT as allowed by the
// following statements: You may freely use this file
// for your own work (personal or commercial),
// including modifications and distribution in
// executable form only. Permission is granted to use
// this file in classroom situations, including its
// use in presentation materials, as long as the book
// "Thinking in Java" is cited as the source. 
// Except in classroom situations, you may not copy
// and distribute this code; instead, the sole
// distribution point is http://www.BruceEckel.com 
// (and official mirror sites) where it is
// freely available. You may not remove this
// copyright and notice. You may not distribute
// modified versions of the source code in this
// package. You may not use this file in printed
// media without the express permission of the
// author. Bruce Eckel makes no representation about
// the suitability of this software for any purpose.
// It is provided "as is" without express or implied
// warranty of any kind, including any implied
// warranty of merchantability, fitness for a
// particular purpose or non-infringement. The entire
// risk as to the quality and performance of the
// software is with you. Bruce Eckel and the
// publisher shall not be liable for any damages
// suffered by you or any third party as a result of
// using or distributing software. In no event will
// Bruce Eckel or the publisher be liable for any
// lost revenue, profit, or data, or for direct,
// indirect, special, consequential, incidental, or
// punitive damages, however caused and regardless of
// the theory of liability, arising out of the use of
// or inability to use software, even if Bruce Eckel
// and the publisher have been advised of the
// possibility of such damages. Should the software
// prove defective, you assume the cost of all
// necessary servicing, repair, or correction. If you
// think you've found an error, please email all
// modified files with clearly commented changes to:
// Bruce@EckelObjects.com. (please use the same
// address for non-code errors found in the book).
//////////////////////////////////////////////////

从一个打包文件中提取文件时，当初所用系统的文件分隔符也会标注出来，以便用本地系统适用的符号替换它。

当前章的子目录保存在chapter字段中，它初始化成c02（大家可注意一下第2章的列表正好没有包含一个打包语句）。只有在当前文件里发现一个package（打包）语句时，chapter字段才会发生改变。

构建一个打包文件

第一个构造器用于从本书的ASCII文本版里提取出一个文件。发出调用的代码（在列表里较深的地方）会读入并检查每一行，直到找到与一个列表的开头相符的为止。在这个时候，它就会新建一个SourceCodeFile对象，将第一行的内容（已经由调用代码读入了）传递给它，同时还要传递BufferedReader对象，以便在这个缓冲区中提取源码列表剩余的内容。

从这时起，大家会发现String方法被频繁运用。为提取出文件名，需调用substring()的重载版本，令其从一个起始偏移开始，一直读到字串的末尾，从而形成一个“子串”。为算出这个起始索引，先要用length()得出startMarker的总长，再用trim()删除字串头尾多余的空格。第一行在文件名后也可能有一些字符；它们是用indexOf()侦测出来的。若没有发现找到我们想寻找的字符，就返回-1；若找到那些字符，就返回它们第一次出现的位置。注意这也是indexOf()的一个重载版本，采用一个字串作为参数，而非一个字符。

解析出并保存好文件名后，第一行会被置入字串contents中（该字串用于保存源码清单的完整正文）。随后，将剩余的代码行读入，并合并进入contents字串。当然事情并没有想象的那么简单，因为特定的情况需加以特别的控制。一种情况是错误检查：若直接遇到一个startMarker（起始标记），表明当前操作的这个代码列表没有设置一个结束标记。这属于一个出错条件，需要退出程序。

另一种特殊情况与package关键字有关。尽管Java是一种自由形式的语言，但这个程序要求package关键字必须位于行首。若发现package关键字，就通过检查位于开头的空格以及位于末尾的分号，从而提取出包名（注意亦可一次单独的操作实现，方法是使用重载的substring()，令其同时检查起始和结束索引位置）。随后，将包名中的点号替换成特定的文件分隔符——当然，这里要假设文件分隔符仅有一个字符的长度。尽管这个假设可能对目前的所有系统都是适用的，但一旦遇到问题，一定不要忘了检查一下这里。默认操作是将每一行都连接到contents里，同时还有换行字符，直到遇到一个endMarker（结束标记）为止。该标记指出构造器应当停止了。若在endMarker之前遇到了文件结尾，就认为存在一个错误。

从打包文件中提取

第二个构造器用于将源码文件从打包文件中恢复（提取）出来。在这儿，作为调用者的方法不必担心会跳过一些中间文本。打包文件包含了所有源码文件，它们相互间紧密地靠在一起。需要传递给该构造器的仅仅是一个BufferedReader，它代表着“信息源”。构造器会从中提取出自己需要的信息。但在每个代码列表开始的地方还有一些配置信息，它们的身份是用packMarker（打包标记）指出的。若packMarker不存在，意味着调用者试图用错误的方法来使用这个构造器。

一旦发现packMarker，就会将其剥离出来，并提取出目录名（用一个'#'结尾）以及文件名（直到行末）。不管在哪种情况下，旧分隔符都会被替换成本地适用的一个分隔符，这是用String replace()方法实现的。老的分隔符被置于打包文件的开头，在代码列表稍靠后的一部分即可看到是如何把它提取出来的。

构造器剩下的部分就非常简单了。它读入每一行，把它合并到contents里，直到遇见endMarker为止。

程序列表的存取

接下来的一系列方法是简单的访问器：directory()、filename()（注意方法可能与字段有相同的拼写和大小写形式）和contents()。而hasFile()用于指出这个对象是否包含了一个文件（很快就会知道为什么需要这个）。

最后三个方法致力于将这个代码列表写进一个文件——要么通过writePacked()写入一个打包文件，要么通过writeFile()写入一个Java源码文件。writePacked()需要的唯一东西就是DataOutputStream，它是在别的地方打开的，代表着准备写入的文件。它先把头信息置入第一行，再调用writeBytes()将contents（内容）写成一种“通用”格式。

准备写Java源码文件时，必须先把文件建好。这是用IO.psOpen()实现的。我们需要向它传递一个File对象，其中不仅包含了文件名，也包含了路径信息。但现在的问题是：这个路径实际存在吗？用户可能决定将所有源码目录都置入一个完全不同的子目录，那个目录可能是尚不存在的。所以在正式写每个文件之前，都要调用File.mkdirs()方法，建好我们想向其中写入文件的目录路径。它可一次性建好整个路径。

整套列表的包容

以子目录的形式组织代码列表是非常方便的，尽管这要求先在内存中建好整套列表。之所以要这样做，还有另一个很有说服力的原因：为了构建更“健康”的系统。也就是说，在创建代码列表的每个子目录时，都会加入一个额外的文件，它的名字包含了那个目录内应有的文件数目。

DirMap类可帮助我们实现这一效果，并有效地演示了一个“多重映射”的概述。这是通过一个散列表（Hashtable）实现的，它的“键”是准备创建的子目录，而“值”是包含了那个特定目录中的SourceCodeFile对象的Vector对象。所以，我们在这儿并不是将一个“键”映射（或对应）到一个值，而是通过对应的Vector，将一个键“多重映射”到一系列值。尽管这听起来似乎很复杂，但具体实现时却是非常简单和直接的。大家可以看到，DirMap类的大多数代码都与向文件中的写入有关，而非与“多重映射”有关。与它有关的代码仅极少数而已。

可通过两种方式建立一个DirMap（目录映射或对应）关系：默认构造器假定我们希望目录从当前位置向下展开，而另一个构造器让我们为起始目录指定一个备用的“绝对”路径。

add()方法是一个采取的行动比较密集的场所。首先将directory()从我们想添加的SourceCodeFile里提取出来，然后检查散列表（Hashtable），看看其中是否已经包含了那个键。如果没有，就向散列表加入一个新的Vector，并将它同那个键关联到一起。到这时，不管采取的是什么途径，Vector都已经就位了，可以将它提取出来，以便添加SourceCodeFile。由于Vector可象这样同散列表方便地合并到一起，所以我们从两方面都能感觉得非常方便。

写一个打包文件时，需打开一个准备写入的文件（当作DataOutputStream打开，使数据具有“通用”性），并在第一行写入与老的分隔符有关的头信息。接着产生对Hashtable键的一个Enumeration（枚举），并遍历其中，选择每一个目录，并取得与那个目录有关的Vector，使那个Vector中的每个SourceCodeFile都能写入打包文件中。

用write()将Java源码文件写入它们对应的目录时，采用的方法几乎与writePackedFile()完全一致，因为两个方法都只需简单调用SourceCodeFile中适当的方法。但在这里，根路径会传递给SourceCodeFile.writeFile()。所有文件都写好后，名字中指定了已写文件数量的那个附加文件也会被写入。

主程序

前面介绍的那些类都要在CodePackager中用到。大家首先看到的是用法字串。一旦最终用户不正确地调用了程序，就会打印出介绍正确用法的这个字串。调用这个字串的是usage()方法，同时还要退出程序。main()唯一的任务就是判断我们希望创建一个打包文件，还是希望从一个打包文件中提取什么东西。随后，它负责保证使用的是正确的参数，并调用适当的方法。

创建一个打包文件时，它默认位于当前目录，所以我们用默认构造器创建DirMap。打开文件后，其中的每一行都会读入，并检查是否符合特殊的条件：

(1) 若行首是一个用于源码列表的起始标记，就新建一个SourceCodeFile对象。构造器会读入源码列表剩下的所有内容。结果产生的指针将直接加入DirMap。

(2) 若行首是一个用于源码列表的结束标记，表明某个地方出现错误，因为结束标记应当只能由SourceCodeFile构造器发现。

提取／释放一个打包文件时，提取出来的内容可进入当前目录，亦可进入另一个备用目录。所以需要相应地创建DirMap对象。打开文件，并将第一行读入。老的文件路径分隔符信息将从这一行中提取出来。随后根据输入来创建第一个SourceCodeFile对象，它会加入DirMap。只要包含了一个文件，新的SourceCodeFile对象就会创建并加入（创建的最后一个用光输入内容后，会简单地返回，然后hasFile()会返回一个错误）。

17.1.2 检查大小写样式

尽管对涉及文字处理的一些项目来说，前例显得比较方便，但下面要介绍的项目却能立即发挥作用，因为它执行的是一个样式检查，以确保我们的大小写形式符合“事实上”的Java样式标准。它会在当前目录中打开每个.java文件，并提取出所有类名以及标识符。若发现有不符合Java样式的情况，就向我们提出报告。

为了让这个程序正确运行，首先必须构建一个类名，将它作为一个“仓库”，负责容纳标准Java库中的所有类名。为达到这个目的，需遍历用于标准Java库的所有源码子目录，并在每个子目录都运行ClassScanner。至于参数，则提供仓库文件的名字（每次都用相同的路径和名字）和命令行开关-a，指出类名应当添加到该仓库文件中。

为了用程序检查自己的代码，需要运行它，并向它传递要使用的仓库文件的路径与名字。它会检查当前目录中的所有类和标识符，并告诉我们哪些没有遵守典型的Java大写写规范。

要注意这个程序并不是十全十美的。有些时候，它可能报告自己查到一个问题。但当我们仔细检查代码的时候，却发现没有什么需要更改的。尽管这有点儿烦人，但仍比自己动手检查代码中的所有错误强得多。

下面列出源代码，后面有详细的解释：

//: ClassScanner.java
// Scans all files in directory for classes
// and identifiers, to check capitalization.
// Assumes properly compiling code listings.
// Doesn't do everything right, but is a very
// useful aid.
import java.io.*;
import java.util.*;

class MultiStringMap extends Hashtable {
  public void add(String key, String value) {
    if(!containsKey(key))
      put(key, new Vector());
    ((Vector)get(key)).addElement(value);
  }
  public Vector getVector(String key) {
    if(!containsKey(key)) {
      System.err.println(
        "ERROR: can't find key: " + key);
      System.exit(1);
    }
    return (Vector)get(key);
  }
  public void printValues(PrintStream p) {
    Enumeration k = keys();
    while(k.hasMoreElements()) {
      String oneKey = (String)k.nextElement();
      Vector val = getVector(oneKey);
      for(int i = 0; i < val.size(); i++)
        p.println((String)val.elementAt(i));
    }
  }
}

public class ClassScanner {
  private File path;
  private String[] fileList;
  private Properties classes = new Properties();
  private MultiStringMap 
    classMap = new MultiStringMap(),
    identMap = new MultiStringMap();
  private StreamTokenizer in;
  public ClassScanner() {
    path = new File(".");
    fileList = path.list(new JavaFilter());
    for(int i = 0; i < fileList.length; i++) {
      System.out.println(fileList[i]);
      scanListing(fileList[i]);
    }
  }
  void scanListing(String fname) {
    try {
      in = new StreamTokenizer(
          new BufferedReader(
            new FileReader(fname)));
      // Doesn't seem to work:
      // in.slashStarComments(true);
      // in.slashSlashComments(true);
      in.ordinaryChar('/');
      in.ordinaryChar('.');
      in.wordChars('_', '_');
      in.eolIsSignificant(true);
      while(in.nextToken() != 
            StreamTokenizer.TT_EOF) {
        if(in.ttype == '/')
          eatComments();
        else if(in.ttype == 
                StreamTokenizer.TT_WORD) {
          if(in.sval.equals("class") || 
             in.sval.equals("interface")) {
            // Get class name:
               while(in.nextToken() != 
                     StreamTokenizer.TT_EOF
                     && in.ttype != 
                     StreamTokenizer.TT_WORD)
                 ;
               classes.put(in.sval, in.sval);
               classMap.add(fname, in.sval);
          }
          if(in.sval.equals("import") ||
             in.sval.equals("package"))
            discardLine();
          else // It's an identifier or keyword
            identMap.add(fname, in.sval);
        }
      }
    } catch(IOException e) {
      e.printStackTrace();
    }
  }
  void discardLine() {
    try {
      while(in.nextToken() != 
            StreamTokenizer.TT_EOF
            && in.ttype != 
            StreamTokenizer.TT_EOL)
        ; // Throw away tokens to end of line
    } catch(IOException e) {
      e.printStackTrace();
    }
  }
  // StreamTokenizer's comment removal seemed
  // to be broken. This extracts them:
  void eatComments() {
    try {
      if(in.nextToken() != 
         StreamTokenizer.TT_EOF) {
        if(in.ttype == '/')
          discardLine();
        else if(in.ttype != '*')
          in.pushBack();
        else 
          while(true) {
            if(in.nextToken() == 
              StreamTokenizer.TT_EOF)
              break;
            if(in.ttype == '*')
              if(in.nextToken() != 
                StreamTokenizer.TT_EOF
                && in.ttype == '/')
                break;
          }
      }
    } catch(IOException e) {
      e.printStackTrace();
    }
  }
  public String[] classNames() {
    String[] result = new String[classes.size()];
    Enumeration e = classes.keys();
    int i = 0;
    while(e.hasMoreElements())
      result[i++] = (String)e.nextElement();
    return result;
  }
  public void checkClassNames() {
    Enumeration files = classMap.keys();
    while(files.hasMoreElements()) {
      String file = (String)files.nextElement();
      Vector cls = classMap.getVector(file);
      for(int i = 0; i < cls.size(); i++) {
        String className = 
          (String)cls.elementAt(i);
        if(Character.isLowerCase(
             className.charAt(0)))
          System.out.println(
            "class capitalization error, file: "
            + file + ", class: " 
            + className);
      }
    }
  }
  public void checkIdentNames() {
    Enumeration files = identMap.keys();
    Vector reportSet = new Vector();
    while(files.hasMoreElements()) {
      String file = (String)files.nextElement();
      Vector ids = identMap.getVector(file);
      for(int i = 0; i < ids.size(); i++) {
        String id = 
          (String)ids.elementAt(i);
        if(!classes.contains(id)) {
          // Ignore identifiers of length 3 or
          // longer that are all uppercase
          // (probably static final values):
          if(id.length() >= 3 &&
             id.equals(
               id.toUpperCase()))
            continue;
          // Check to see if first char is upper:
          if(Character.isUpperCase(id.charAt(0))){
            if(reportSet.indexOf(file + id)
                == -1){ // Not reported yet
              reportSet.addElement(file + id);
              System.out.println(
                "Ident capitalization error in:"
                + file + ", ident: " + id);
            }
          }
        }
      }
    }
  }
  static final String usage =
    "Usage: \n" + 
    "ClassScanner classnames -a\n" +
    "\tAdds all the class names in this \n" +
    "\tdirectory to the repository file \n" +
    "\tcalled 'classnames'\n" +
    "ClassScanner classnames\n" +
    "\tChecks all the java files in this \n" +
    "\tdirectory for capitalization errors, \n" +
    "\tusing the repository file 'classnames'";
  private static void usage() {
    System.err.println(usage);
    System.exit(1);
  }
  public static void main(String[] args) {
    if(args.length < 1 || args.length > 2)
      usage();
    ClassScanner c = new ClassScanner();
    File old = new File(args[0]);
    if(old.exists()) {
      try {
        // Try to open an existing 
        // properties file:
        InputStream oldlist =
          new BufferedInputStream(
            new FileInputStream(old));
        c.classes.load(oldlist);
        oldlist.close();
      } catch(IOException e) {
        System.err.println("Could not open "
          + old + " for reading");
        System.exit(1);
      }
    }
    if(args.length == 1) {
      c.checkClassNames();
      c.checkIdentNames();
    }
    // Write the class names to a repository:
    if(args.length == 2) {
      if(!args[1].equals("-a"))
        usage();
      try {
        BufferedOutputStream out =
          new BufferedOutputStream(
            new FileOutputStream(args[0]));
        c.classes.save(out,
          "Classes found by ClassScanner.java");
        out.close();
      } catch(IOException e) {
        System.err.println(
          "Could not write " + args[0]);
        System.exit(1);
      }
    }
  }
}

class JavaFilter implements FilenameFilter {
  public boolean accept(File dir, String name) {
    // Strip path information:
    String f = new File(name).getName();
    return f.trim().endsWith(".java");
  }
} ///:~

MultiStringMap类是个特殊的工具，允许我们将一组字串与每个键项对应（映射）起来。和前例一样，这里也使用了一个散列表（Hashtable），不过这次设置了继承。该散列表将键作为映射成为Vector值的单一的字串对待。add()方法的作用很简单，负责检查散列表里是否存在一个键。如果不存在，就在其中放置一个。getVector()方法为一个特定的键产生一个Vector；而printValues()将所有值逐个Vector地打印出来，这对程序的调试非常有用。

为简化程序，来自标准Java库的类名全都置入一个Properties（属性）对象中（来自标准Java库）。记住Properties对象实际是个散列表，其中只容纳了用于键和值项的String对象。然而仅需一次方法调用，我们即可把它保存到磁盘，或者从磁盘中恢复。实际上，我们只需要一个名字列表，所以为键和值都使用了相同的对象。

针对特定目录中的文件，为找出相应的类与标识符，我们使用了两个MultiStringMap：classMap以及identMap。此外在程序启动的时候，它会将标准类名仓库装载到名为classes的Properties对象中。一旦在本地目录发现了一个新类名，也会将其加入classes以及classMap。这样一来，classMap就可用于在本地目录的所有类间遍历，而且可用classes检查当前标记是不是一个类名（它标记着对象或方法定义的开始，所以收集接下去的记号——直到碰到一个分号——并将它们都置入identMap）。

ClassScanner的默认构造器会创建一个由文件名构成的列表（采用FilenameFilter的JavaFilter实现形式，参见第10章）。随后会为每个文件名都调用scanListing()。

在scanListing()内部，会打开源码文件，并将其转换成一个StreamTokenizer。根据Java帮助文档，将true传递给slashStartComments()和slashSlashComments()的本意应当是剥除那些注释内容，但这样做似乎有些问题（在Java 1.0中几乎无效）。所以相反，那些行被当作注释标记出去，并用另一个方法来提取注释。为达到这个目的，'/'必须作为一个原始字符捕获，而不是让StreamTokeinzer将其当作注释的一部分对待。此时要用ordinaryChar()方法指示StreamTokenizer采取正确的操作。同样的道理也适用于点号（'.'），因为我们希望让方法调用分离出单独的标识符。但对下划线来说，它最初是被StreamTokenizer当作一个单独的字符对待的，但此时应把它留作标识符的一部分，因为它在static final值中以TT_EOF等等形式使用。当然，这一点只对目前这个特殊的程序成立。wordChars()方法需要取得我们想添加的一系列字符，把它们留在作为一个单词看待的记号中。最后，在解析单行注释或者放弃一行的时候，我们需要知道一个换行动作什么时候发生。所以通过调用eollsSignificant(true)，换行符（EOL）会被显示出来，而不是被StreamTokenizer吸收。

scanListing()剩余的部分将读入和检查记号，直至文件尾。一旦nextToken()返回一个final static值——StreamTokenizer.TT_EOF，就标志着已经抵达文件尾部。

若记号是个'/'，意味着它可能是个注释，所以就调用eatComments()，对这种情况进行处理。我们在这儿唯一感兴趣的其他情况是它是否为一个单词，当然还可能存在另一些特殊情况。

如果单词是class（类）或interface（接口），那么接着的记号就应当代表一个类或接口名字，并将其置入classes和classMap。若单词是import或者package，那么我们对这一行剩下的东西就没什么兴趣了。其他所有东西肯定是一个标识符（这是我们感兴趣的），或者是一个关键字（对此不感兴趣，但它们采用的肯定是小写形式，所以不必兴师动众地检查它们）。它们将加入到identMap。

discardLine()方法是一个简单的工具，用于查找行末位置。注意每次得到一个新记号时，都必须检查行末。

只要在主解析循环中碰到一个正斜杠，就会调用eatComments()方法。然而，这并不表示肯定遇到了一条注释，所以必须将接着的记号提取出来，检查它是一个正斜杠（那么这一行会被丢弃），还是一个星号。但假如两者都不是，意味着必须在主解析循环中将刚才取出的记号送回去！幸运的是，pushBack()方法允许我们将当前记号“压回”输入数据流。所以在主解析循环调用nextToken()的时候，它能正确地得到刚才送回的东西。

为方便起见，classNames()方法产生了一个数组，其中包含了classes集合中的所有名字。这个方法未在程序中使用，但对代码的调试非常有用。

接下来的两个方法是实际进行检查的地方。在checkClassNames()中，类名从classMap提取出来（请记住，classMap只包含了这个目录内的名字，它们按文件名组织，所以文件名可能伴随错误的类名打印出来）。为做到这一点，需要取出每个关联的Vector，并遍历其中，检查第一个字符是否为小写。若确实为小写，则打印出相应的出错提示消息。

在checkIdentNames()中，我们采用了一种类似的方法：每个标识符名字都从identMap中提取出来。如果名字不在classes列表中，就认为它是一个标识符或者关键字。此时会检查一种特殊情况：如果标识符的长度等于3或者更长，而且所有字符都是大写的，则忽略此标识符，因为它可能是一个static final值，比如TT_EOF。当然，这并不是一种完美的算法，但它假定我们最终会注意到任何全大写标识符都是不合适的。

这个方法并不是报告每一个以大写字符开头的标识符，而是跟踪那些已在一个名为reportSet()的Vector中报告过的。它将Vector当作一个“集合”对待，告诉我们一个项目是否已在那个集合中。该项目是通过将文件名和标识符连接起来生成的。若元素不在集合中，就加入它，然后产生报告。

程序列表剩下的部分由main()构成，它负责控制命令行参数，并判断我们是准备在标准Java库的基础上构建由一系列类名构成的“仓库”，还是想检查已写好的那些代码的正确性。不管在哪种情况下，都会创建一个ClassScanner对象。

无论准备构建一个“仓库”，还是准备使用一个现成的，都必须尝试打开现有仓库。通过创建一个File对象并测试是否存在，就可决定是否打开文件并在ClassScanner中装载classes这个Properties列表（使用load()）。来自仓库的类将追加到由ClassScanner构造器发现的类后面，而不是将其覆盖。如果仅提供一个命令行参数，就意味着自己想对类名和标识符名字进行一次检查。但假如提供两个参数（第二个是"-a"），就表明自己想构成一个类名仓库。在这种情况下，需要打开一个输出文件，并用Properties.save()方法将列表写入一个文件，同时用一个字串提供文件头信息。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

17.1 文字处理.md

17.1 文字处理.md

17.1 文字处理

Files

17.1 文字处理.md

Latest commit

History

17.1 文字处理.md

File metadata and controls

17.1 文字处理