java - Extra EFBFBD bytes in Hadoop thriftfs reading -
in hadoop-0.20 have thriftfs contrib, allow access hdfs in other programming language. hadoop provides hdfs.py script demonstration. problem located in do_get , do_put methods. if use get download utf-8 text file, it's totally ok, when get file in other encoding, can not original file, downloaded file has many "efbfbd" bytes. guess these java codes on hadoopthriftserver may cause problems. public string read(thrifthandle tout, long offset, int length) throws thriftioexception { try { = now(); hadoopthrifthandler.log.debug("read: " + tout.id + " offset: " + offset + " length: " + length); fsdatainputstream in = (fsdatainputstream)lookup(tout.id); if (in.getpos() != offset) { in.seek(offset); } byte[] tmp = new byte[length]; int numbytes = in.read(offset, tmp, 0, length); hadoopthrifthandler.log....