adodb: กับดักข้อมูล ๑

Submitted by wd on Sat, 2007-08-11 19:28

Topic:

จากการทดลองนำเข้าข้อมูล dbf ในครั้งก่อน พบข้อผิดพลาดในการแปลงอีกอันนึง คือ สระอำ
หลังจากแปลงมาแล้ว พบว่าส่วนใหญ่จะแปลงได้ถูกต้อง ยกเว้นบางคำที่เขาแปลงออกมาเป็น 2 อักขระ
คือประกอบด้วย นิคหิต ( _ํ ) กับสระอา ( า ) แทนที่จะเป็นสระอำอักขระเดียว
ซึ่งยังไม่ทราบว่าเกิดจากสาเหตุอะไร (อาจเป็นข้อมูลต้นทางไม่ดีก็เป็นได้)

เราจึงควรตรวจสอบข้อมูล ในการนำเข้า ด้วยการกรองสระอำอีกชั้นนึงดังนี้

...
def dncode(s):
  return s.strip().strip('\x00').decode('tis620').encode('utf8').replace('\xe0\xb9\x8d\xe0\xb8\xb2','\xe0\xb8\xb3')
...

update
ถึงเวลาใช้งานจริงก็ยังมีข้อมูลที่ไม่อยู่ในช่วงของ Ascii Codepage-874 หลุดออกมา ทำให้การถอดรหัส (decode('tis620')) ยังรายงานข้อผิดพลาด
เราต้องถอดเอาอักขระขยะออกให้หมด
ฟังก์ชั่น dncode สุดท้ายจึงเป็นดังนี้

...
def dncode(s):
  return s.strip().strip('\x00').strip('\xa0').strip('\xdb').strip('\xdc').strip('\xdd').strip('\xde').strip('\xfc').strip('\xfd').strip('\xfe').strip('\xff').decode('tis620').encode('utf8').replace('\xe0\xb9\x8d\xe0\xb8\xb2','\xe0\xb8\xb3')
...

ตอนทำงาน ไพธอนจะทำงานจากซ้ายไปขวา จึงต้อง strip อักขระขยะออกก่อน แล้วจึงตามด้วยการ decode/encode เป็นลำดับสุดท้าย

Printer-friendly version
Log in or register to post comments
3992 reads

debian: macOS guest on VirtualBox - create installation media short note. wd
debian: Headless Nvidia for Cryptomining II wd
debian: Headless Nvidia for Cryptomining wd
debian: ssh แบบไม่ต้องถามรหัสผ่าน wd
audacity: บันทึกการลดเสียงรบกวน wd
บันทึกการ instruct ฟอนต์ wd
bash: Rip audio disc to mp3 wd
fonts: บันทึก Roboto wd
debian: บันทึก imagemagick - convert wd
ฟอนต์ wd

ThaiTux.info

Navigation

Recent blog posts

Recent content

Popular content

Today's:

Theme

Thai Search

adodb: กับดักข้อมูล ๑

Link

ลิงก์เพื่อน ๆ และท่านผู้รู้

Recent comments

Syndicate

Who's online

ThaiTux.info

Navigation

Recent blog posts

Recent content

Popular content

Today's:

Theme

Thai Search

You are here

adodb: กับดักข้อมูล ๑

Link

ลิงก์เพื่อน ๆ และท่านผู้รู้

Search form

User login

Recent comments

Syndicate

Who's online