Ubuntu中文论坛

发表于： **2009-12-15 18:39**

发现不知道是不是bug的问题，有一个测试文件(test.txt)，内容如下:
● ○
● ○
○

● ○
● ○
● ○
● ○
● ○
● ○
● ○
○ ●
● ○
就12行，utf8编码中间是空格，回车符前没有任何空格.
用 sort < test.txt, 出来的结果同原文件，更神奇的是，
如果用 uniq -c < test.txt , 输出是:
12 ● ○

也就是说， sort和uniq，把“● ○” “○ ●” “○

”这三种unicode字符组合当成一样的？
试了sort的R n 这些参数都无效...
为什么阿...

发表于： **2009-12-15 19:02**

代码：全选

static int
compare (const struct line *a, const struct line *b)
{
  int diff;
  size_t alen, blen;

  /* First try to compare on the specified keys (if any).
     The only two cases with no key at all are unadorned sort,
     and unadorned sort -r. */
  if (keylist)
    {    
      diff = keycompare (a, b);
      if (diff || unique || stable)
        return diff;
    }    

  /* If the keys all compare equal (or no keys were specified)
     fall through to the default comparison.  */
  alen = a->length - 1, blen = b->length - 1; 

  if (alen == 0)
    diff = - NONZERO (blen);
  else if (blen == 0)
    diff = 1; 
  else if (hard_LC_COLLATE)
    diff = xmemcoll (a->text, alen, b->text, blen);
  else if (! (diff = memcmp (a->text, b->text, MIN (alen, blen))))
    diff = alen < blen ? -1 : alen != blen;

  return reverse ? -diff : diff;
}

从sort.c源码看
export LC_ALL="C"
然后执行sort就是按照字节内容做比较了

Ubuntu中文论坛

关于linux的 "sort" 和 "uniq"

关于linux的 "sort" 和 "uniq"

Re: 关于linux的 "sort" 和 "uniq"