概述
mobi文件是亚马逊独有的一款手机电子书格式文件,不过并非所有的手机阅读器都支持它,主流的阅读器是kindle,不过还需要购买kindle阅读器才行。当然也有专门阅读这种格式文件的软件。现在逐渐被ARZW3取替。
mobi格式介绍: http://wiki.mobileread.com/wiki/MOBI
kinder reader 阅读器python代码 : https://github.com/jiedan/kindlereader
mobipocket官网:http://www.mobipocket.com/dev/default.asp
mobi的阅读器:
FBReader: http://fbreader.org/win32
mobireader: http://www.techspot.com/downloads.php?action=download_now&id=5612&evp=e4b5913d04404dc99ab0b7bfd806c690&file=1
kindle for pc: http://www.amazon.com/gp/feature.html?ie=UTF8&docId=1000426311
外文机翻(临时,有时间再整理)
组成部分
MOBI 是为 MobiPocket 阅读器开发的格式的名称。目前,亚马逊使用的 DRM 方案略有不同,称为AZW。亚马逊对KindleGen创建的文件使用此扩展名,即使它们实际上在同一文件中 同时具有 MOBI 格式(有时称为 KF7)和KF8格式。
概述
MOBI 是MobiPocket 阅读器和Amazon Kindle阅读器使用的格式。它可能具有 .mobi 扩展名,也可能具有.prc扩展名。用户可以将扩展名更改为任何一种接受的形式。在这两种情况下,它可能受DRM保护,也可能不受 DRM 保护。使用 .prc 扩展名是因为PalmOS不支持除.prc或.pdb之外的任何文件扩展名。请注意,Mobipocket禁止在支持其他 DRM 格式的专用电子书阅读器上使用其DRM格式。Mobi 源文件基于OEB(开放电子书标准)。
描述
MOBI 格式最初是PalmDOC格式的扩展,通过向数据添加某些类似HTML 的标签(请参阅电子书 HTML ) 。许多MOBI格式的文档仍然使用这种形式。然而,这种文件格式还有一个高压缩版本,可以以专有方式更大程度地压缩数据。有一些第三方程序可以阅读原始 MOBI 格式的电子书,但只有少数第三方程序可以阅读新压缩格式的电子书。更高的压缩模式使用霍夫曼编码方案,称为 Huff/cdic 算法。有关 Python 中的描述,请查看 作为Calibre项目一部分提供的huffcdic.py。
有时会向该格式添加功能,因此如果您尝试使用低级阅读器阅读新文件,则新文件可能会出现问题。目前,源文件遵循开放电子书格式的准则。
请注意,Amazon Kindle的AZW与 MOBI 的格式相同,只是它使用不同的DRM方案。 亚马逊拥有MobiPocket。下面的格式描述适用于这两种文件类型。
格式
与 PalmDOC 一样,Mobipocket 文件格式是标准Palm 数据库格式文件。该格式的标题包括数据库名称(通常是书名,有时是作者姓名的一部分),最多为31 个字节的数据。这些文件被标识为 MOBI 的创建者 ID 和 BOOK 的类型。
Mobipocket 有一些最小的文件格式信息,主要是关于他们在本书文本中使用的HTML编码,位于http://www.mobipocket.com/dev/article.asp?BaseFolder=prcgen(替换为存档副本)。另请参阅EBook HTML for Mobi7 版本的 HTML。
PalmDOC 标头
Palm 数据库格式中的第一条记录 提供了有关 Mobipocket 文件的更多信息。前 16 个字节几乎与 PalmDOC 格式文件的前 16 个字节相同。
抵消 | 字节 | 内容 | 评论 |
---|---|---|---|
0 | 2 | 压缩 | 1 == 无压缩,2 = PalmDOC 压缩,17480 = HUFF/CDIC 压缩 |
2 | 2 | 没用过 | 始终为零 |
4 | 4 | 文本长度 | 本书全文的未压缩长度 |
8 | 2 | 记录数 | 用于书籍文本的 PDB 记录数。 |
10 | 2 | 记录大小 | 包含文本的每个记录的最大大小,始终为 4096 |
12 | 4 | 当前位置 | 当前读取位置,作为未压缩文本的偏移量 |
与 Palm DOC 文件有两个不同之处。还有一种附加的压缩类型 (17480),并且当前位置字节用于不同的目的:
抵消 | 字节 | 内容 | 评论 |
---|---|---|---|
12 | 2 | 加密类型 | 0 == 无加密,1 = 旧 Mobipocket 加密,2 = Mobipocket 加密 |
14 | 2 | 未知 | 通常为零 |
旧的 Mobipocket 加密方案仅允许使用一个 PID 注册文件,而当前的加密方案允许在单个文件中使用多个 PID。除非特别说明,本页所有加密信息均指当前方案。
移动端头
大多数 Mobipocket 文件在这 16 个字节后面的记录 0 中还有一个 MOBI 标头,较新的格式在 MOBI 标头后面也有一个 EXTH 标头,同样全部位于 PDB 文件格式的记录 0 中。
MOBI 标头的长度可变且未记录。部分字段初步确定如下:
抵消 | 十六进制 | 字节 | 内容 | 评论 |
---|---|---|---|---|
16 | 0x10 | 4 | 标识符 | 人物MOBI |
20 | 0x14 | 4 | 标头长度 | MOBI 标头的长度,包括前面的 4 个字节 |
24 | 0x18 | 4 | 移动型 | 这是 Mobipocket 文件的类型 2 移动口袋书 3 PalmDoc 书 4 音频 232 手机口袋?kindlegen1.2生成 248 KF8:由kindlegen2生成 257 新闻 258 新闻_提要 259 新闻_杂志 第513章 第514章 第515章 第516章 PPT 第517章 第518章 |
28 | 0x1c | 4 | 文本编码 | 1252 = CP1252 (WinLatin1); 65001 = UTF-8 |
32 | 0x20 | 4 | 唯一身份 | 某种唯一的 ID 号(随机?) |
36 | 0x24 | 4 | 文件版本 | 此文件中使用的 Mobipocket 格式的版本。 |
40 | 0x28 | 4 | 正字法索引 | 正字法元索引的节号。如果索引不可用,则为 0xFFFFFFFF。 |
44 | 0x2c | 4 | 拐点指数 | 词形变化元索引的节号。如果索引不可用,则为 0xFFFFFFFF。 |
48 | 0x30 | 4 | 索引名称 | 如果索引不可用,则为 0xFFFFFFFF。 |
52 | 0x34 | 4 | 索引键 | 如果索引不可用,则为 0xFFFFFFFF。 |
56 | 0x38 | 4 | 额外索引 0 | 额外 0 元索引的节号。如果索引不可用,则为 0xFFFFFFFF。 |
60 | 0x3c | 4 | 额外索引1 | 额外 1 个元索引的节号。如果索引不可用,则为 0xFFFFFFFF。 |
64 | 0x40 | 4 | 额外索引2 | 额外 2 个元索引的节号。如果索引不可用,则为 0xFFFFFFFF。 |
68 | 0x44 | 4 | 额外索引3 | 额外 3 个元索引的节号。如果索引不可用,则为 0xFFFFFFFF。 |
72 | 0x48 | 4 | 额外索引 4 | 额外 4 个元索引的节号。如果索引不可用,则为 0xFFFFFFFF。 |
76 | 0x4c | 4 | 额外索引5 | 额外 5 个元索引的节号。如果索引不可用,则为 0xFFFFFFFF。 |
80 | 0x50 | 4 | 第一个非图书索引? | 第一个记录编号(从 0 开始)不是书籍的文本 |
84 | 0x54 | 4 | 全名偏移量 | 图书全名在记录 0 中的偏移量(不是从文件开头开始) |
88 | 0x58 | 4 | 全名长度 | 书本全名的长度(以字节为单位) |
92 | 0x5c | 4 | 语言环境 | 预订区域设置代码。低字节是主要语言09=英语,下一个字节是方言,08=英国,04=美国。因此美国英语是 1033,英国英语是 2057。 |
96 | 0x60 | 4 | 输入语言 | 字典的输入语言 |
100 | 0x64 | 4 | 输出语言 | 字典的输出语言 |
104 | 0x68 | 4 | 最低版本 | 读取此文件所需的最低 mobipocket 版本支持。 |
108 | 0x6c | 4 | 第一个图像索引 | 包含图像的第一个记录号(从 0 开始)。图像记录应该是连续的。 |
112 | 0x70 | 4 | 霍夫曼记录偏移 | 第一个霍夫曼压缩记录的记录号。 |
116 | 0x74 | 4 | 霍夫曼记录计数 | 霍夫曼压缩记录的数量。 |
120 | 0x78 | 4 | 霍夫曼表偏移 | |
124 | 0x7c | 4 | 霍夫曼表长度 | |
128 | 0x80 | 4 | EXTH 标志 | 位域。如果设置了位 6 (0x40),则有 EXTH 记录 |
132 | 0x84 | 32 | ? | 32 个未知字节,如果 MOBI 足够长 |
164 | 0xa4 | 4 | 未知 | 使用0xFFFFFFFF |
168 | 0xa8 | 4 | DRM 偏移量 | DRMed 文件中 DRM 密钥信息的偏移量。如果没有 DRM,则为 0xFFFFFFFF |
172 | 0xac | 4 | DRM 计数 | DRM 信息中的条目数。如果没有 DRM,则为 0xFFFFFFFF |
176 | 0xb0 | 4 | 数字版权管理大小 | DRM 信息中的字节数。 |
180 | 0xb4 | 4 | DRM 标志 | 一些有关 DRM 信息的标志。 |
184 | 0xb8 | 8 | 未知 | 到 MOBI 标头末尾的字节,如果标头长度 >= 228(从记录开始算起为 244),则包括以下内容。 使用 0x0000000000000000。 |
192 | 0xc0 | 2 | 第一个内容记录编号 | 第一个文本记录的编号。通常1. |
194 | 0xc2 | 2 | 最后内容记录编号 | 最后图像记录的编号或最后文本记录的编号(如果不包含图像)。包括图像、DATP、HUFF、DRM。 |
196 | 0xc4 | 4 | 未知 | 使用 0x00000001。 |
200 | 0xc8 | 4 | FCIS 记录号 | |
204 | 0xcc | 4 | 未知(FCIS 记录数?) | 使用 0x00000001。 |
208 | 0xd0 | 4 | FLIS 记录号 | |
212 | 0xd4 | 4 | 未知(FLIS 记录数?) | 使用 0x00000001。 |
216 | 0xd8 | 8 | 未知 | 使用 0x0000000000000000。 |
224 | 0xe0 | 4 | 未知 | 使用 0xFFFFFFFF。 |
228 | 0xe4 | 4 | 第一次编译数据部分计数 | 使用 0x00000000。 |
第232章 | 0xe8 | 4 | 编译数据部分的数量 | 使用 0xFFFFFFFF。 |
236 | 0xec | 4 | 未知 | 使用 0xFFFFFFFF。 |
240 | 0xf0 | 4 | 额外记录数据标志 | 一组二进制标志,其中一些表示每个文本块末尾的额外数据。当标头长度为 228 (0xE4) 或 232 (0xE8) 时,这似乎仅对 Mobipocket 格式版本 5 和 6(以及更高版本?)有效。
设置位 2 (0x2) 将禁用 <guide><reference type="start"> 功能。 |
244 | 0xf4 | 4 | INDX 记录偏移量 | (如果不是 0xFFFFFFFF)从 ncx 文件创建的第一个 INDX 记录的记录号。 |
248 | 0xf8 | 4 | 未知 | 0xFFFFFFFF 在新的 MOBI 文件中,MOBI 标头长度为 256,跳过此至 EXTH 标头。 |
第252章 | 0xFB | 4 | 未知 | 0xFFFFFFFF 在新的 MOBI 文件中,MOBI 标头长度为 256,跳过此至 EXTH 标头。 |
256 | 0x100 | 4 | 未知 | 0xFFFFFFFF 在新的 MOBI 文件中,MOBI 标头长度为 256,跳过此至 EXTH 标头。 |
260 | 0x104 | 4 | 未知 | 0xFFFFFFFF 在新的 MOBI 文件中,MOBI 标头长度为 256,跳过此至 EXTH 标头。 |
264 | 0x108 | 4 | 未知 | 0xFFFFFFFF 在新的 MOBI 文件中,MOBI 标头长度为 256,跳过此至 EXTH 标头。 |
268 | 0x10b | 4 | 未知 | 0 在新的 MOBI 文件中,MOBI 标头长度为 256,跳到 EXTH 标头,MOBI 标头长度为 256,并从 PalmDOC 标头添加 12 个字节,因此该索引为 268。 |
扩展头
如果 MOBI 标头指示存在 EXTH 标头,则它紧跟在 MOBI 标头之后。由于 MOBI 标头的长度可变,因此它不在记录 0 中的任何固定偏移处。请注意,如果 MOBI 标头中指定的 mobipocket 版本号为 2 或更少(可能为 3 或更少),则某些读者将忽略任何 EXTH 标头信息)。
EXTH 标头也没有记录,因此其中一些内容是猜测。
字节 | 内容 | 评论 |
---|---|---|
4 | 标识符 | 字符 EXTH |
4 | 标头长度 | EXTH 标头的长度,包括前面的 4 个字节,但不包括最后的填充。 |
4 | 记录数 | EXTH 标头中的记录数。EXTH 标头的其余部分由重复的 EXTH 记录组成,直至 EXTH 长度的末尾。 |
EXTH 记录开始 | 重复直到完成。 | |
4 | 记录类型 | 扩展记录类型。只是一个识别记录中存储内容的数字 |
4 | 记录长度 | EXTH 记录的长度 = L ,包括类型和长度字段中的 8 个字节 |
L-8 | 记录数据 | 数据。 |
EXTH 记录结束 | 重复直到完成。 | |
p | 填充 | 用于将 EXTH 标头填充为四字节倍数的空字节(如果标头已经是四字节的倍数,则没有空字节)。此填充不包含在 EXTH 标头长度中。 |
有许多不同的 EXTH 记录类型。此处列出了迄今为止在 Mobipocket 文件中找到的内容以及可能的含义。希望随着更多信息的曝光,该表将被填写。
记录类型 | 通常长度 | 姓名 | 评论 | opf 元标记 |
---|---|---|---|---|
1 | drm_服务器_id | |||
2 | drm_commerce_id | |||
3 | drm_ebookbase_book_id | |||
100 | 作者 | <dc:创建者> | ||
101 | 出版商 | <dc:发布者> | ||
102 | 印记 | <印记> | ||
103 | 描述 | <dc:描述> | ||
104 | 国际标准书号 | <dc:标识符方案='ISBN'> | ||
105 | 主题 | 可能出现多次 | <dc:主题> | |
106 | 出版日期 | <dc:日期> | ||
107 | 审查 | <评论> | ||
108 | 贡献者 | <dc:贡献者> | ||
109 | 权利 | <dc:权利> | ||
110 | 主题代码 | <dc:主题 BASICCode="subjectcode"> | ||
111 | 类型 | <dc:类型> | ||
112 | 来源 | <dc:来源> | ||
113 | 阿辛 | 如果书籍没有此记录,Kindle Paperwhite 会将其标记为“个人”。 | ||
114 | 版本号 | |||
115 | 4 | 样本 | 如果书籍内容只是整本书的样本,则为 0x0001 | |
116 | 开始阅读 | 首次打开时在文件中打开的位置(4 字节偏移量) | ||
117 | 3 | 成人 | 如果在其 GUI 上仅选中“成人” ,Mobipocket Creator 就会添加此项;内容:“是” | <成人> |
118 | 零售价 | 作为文本,例如“4.99” | <建议零售价> | |
119 | 零售价货币 | 作为文本,例如“USD” | <SRP 货币=“货币”> | |
121 | 4 | KF8 边界偏移 | ||
122 | 固定布局 | “真的” | ||
123 | 书本型 | “漫画” | ||
124 | 方向锁定 | “无”、“肖像”、“风景” | ||
125 | 4 | 资源数量 | ||
126 | 原始分辨率 | “1072x1448” | ||
127 | 零排水沟 | “真的” | ||
128 | 零利润 | “真的” | ||
129 | 元数据资源 URI | |||
131 | 4 | 未知 | ||
132 | 未知 | “真的” | ||
200 | 3 | 字典简称 | 作为文本 | <字典非常短名称> |
201 | 4 | 覆盖偏移 | 添加到 Mobi 标头中的第一个图像字段以查找包含封面图像的 PDB 记录 | <嵌入式封面> |
第202章 | 4 | 拇指偏移 | 添加到 Mobi 标头中的第一个图像字段以查找包含缩略图封面图像的 PDB 记录 | |
203 | 有假封面 | |||
204 | 4 | 创作者软件 | 已知值:1=mobigen、2=Mobipocket Creator、200=kindlegen (Windows)、201=kindlegen (Linux)、202=kindlegen (Mac)。 警告: Calibre 创建虚假的创建者条目,对于普通电子书伪装成 Linux kindlegen 1.2 (201, 1, 2, 33307),对于期刊伪装成非公开的 Linux kindlegen 2.0 (201, 2, 0, 101)。 | |
205 | 4 | 创建者主要版本 | ||
206 | 4 | 创建者小版本 | ||
207 | 4 | 创建者内部版本号 | ||
208 | 水印 | |||
209 | 防篡改钥匙 | 由 Kindle(和 Android 应用程序)用于生成特定于书籍的 PID。 | ||
300 | 字体签名 | |||
401 | 1 | 削波限制 | 允许剪切的文本的整数百分比。通常是 10 个。 | |
第402章 | 出版商限制 | |||
403 | 未知 | |||
404 | 1 | tts标志 | 1 - 文本转语音已禁用;0 - 启用文本转语音 | |
405 | 1 | 未知(租用/借用标志?) | 该字段中的 1 似乎表示租赁书 | |
406 | 8 | 租金/借用到期日 | 如果从租赁中删除此字段,书上会说它于 1969 年过期 | |
407 | 8 | 未知 | ||
450 | 4 | 未知 | ||
第451章 | 4 | 未知 | ||
第452章 | 4 | 未知 | ||
第453章 | 4 | 未知 | ||
501 | 4 | 类型 | PDOC-- 个人文档;EBOK-电子书;EBSP - 电子书样本; | |
502 | 最后更新时间 | |||
503 | 更新标题 | |||
504 | 阿辛 | 我在该记录中找到了 ASIN 的副本。 | ||
第524章 | 语言 | <dc:语言> | ||
第525章 | 写作模式 | 我在这张记录中找到了horizontal-lr。 | ||
第535章 | 创建者内部版本号 | 我在这条记录中找到了1019-d6e4792,这是Kindlegen 2.7的内部版本号 | ||
第536章 | 未知 | |||
第542章 | 4 | 未知 | 一些 Unix 时间戳。 | |
第547章 | 在记忆中 | 在此记录中找到字符串“I\x00n\x00M\x00e\x00m\x00o\x00r\x00y\x00”,适用于 KindleGen V2.9 版本 1029-0897292 |
剩余记录 0
在PDB文件格式的Record 0末尾,我们通常会得到完整的文件名,其偏移量在MOBI头中给出。
EXTH 记录末尾和名称之间可能存在未知用途的数据。
该名称后跟两个空字节,然后用空字节填充到四字节边界。例如,如果名称长度为 16 个字节,其中有两个空字节,则为 18 个字节,然后添加另外两个空字节,使其总数达到 20 个字节。但是,标头中存储的长度只有 16。如果名称为 19 字节,则后面会跟两个空字节,使其达到 21 字节,然后再填充三个空字节,使其达到 24 字节。
名称和填充后面是更多未知用途的数据(通常为空字节),一直到第 0 节的末尾。
索引元记录
索引的第一条记录包含索引的元数据。
抵消 | 十六进制 | 字节 | 内容 | 评论 |
---|---|---|---|---|
0 | 0x00 | 4 | 标识符 | 字符INDX |
4 | 0x04 | 4 | 标头长度 | INDX 标头的长度,包括前面的 4 个字节 |
8 | 0x08 | 4 | 索引类型 | 索引的类型。已知值:0 - 正常索引,2 - 变形 |
12 | 0x0c | 4 | ? | ? |
16 | 0x10 | 4 | ? | ? |
20 | 0x14 | 4 | 开始 | IDXT 部分的偏移量 |
24 | 0x18 | 4 | 索引计数 | 索引记录数 |
28 | 0x1c | 4 | 索引编码 | 1252 = CP1252 (WinLatin1); 65001 = UTF-8 |
32 | 0x20 | 4 | 索引语言 | 索引的语言代码 |
36 | 0x24 | 4 | 总索引数 | 索引条目数 |
40 | 0x28 | 4 | 开始顺序 | ORDT 部分的偏移量 |
44 | 0x2c | 4 | 轻启动 | LIGT 部分的偏移量 |
48 | 0x30 | 4 | ? | ? |
52 | 0x34 | 4 | ? | ? |
其余 INDX 标头值未知。
TAGX部分
TAGX 部分位于 INDX 标头之后,对于解码索引值至关重要,因为它定义了条目包含多少个控制字节、哪些位对应于哪个标签以及标签需要多少个值(大多数标签需要一个值,但有些标签需要一个值)。有两个,也许更多)。
抵消 | 十六进制 | 字节 | 内容 | 评论 |
---|---|---|---|---|
0 | 0x00 | 4 | 标识符 | 角色 TAGX |
4 | 0x04 | 4 | 标头长度 | TAGX 标头的长度,包括前面的 4 个字节 |
8 | 0x08 | 4 | 控制字节数 | 控制字节数 |
12 | 0x0c | n | 标签表 | 标记表条目(n = 标头长度 - 12,必须是 4 字节的倍数) |
标记表条目是 4 字节的倍数。第一个字节是标签,第二个字节是值的数量,第三个字节是位掩码,第四个字节表示控制字节的结尾。如果第四个字节是 0x01,则该条目的所有其他字节都为零。
可变宽度整数
Mobipocket 格式的某些部分将数据编码为可变宽度整数。这些整数在位 1-7 中以每字节 7 位的大端序表示。它们可以是前向编码(在这种情况下只有 LSB 设置了第 8 位),也可以是后向编码(在这种情况下只有 MSB 设置了位 8)。例如,数字 0x11111 将被前向编码表示为:
0x04 0x22 0x91
并向后编码为:
0x84 0x22 0x11
尾随条目
MOBI 标头的额外数据标志字段指示哪些尾随条目(如果有)附加到每个文本记录的末尾。该字段中的每个设置位表示一个尾随条目。这些条目似乎按位顺序出现;例如,尾随条目 1 紧跟在文本内容之后,条目 16 出现在记录的最后。大多数这些条目的效果和确切细节尚不清楚。位 2-16 指示的尾随条目似乎遵循通用格式。该格式是:
<数据><大小>
其中 <size> 是整个尾随条目的大小(包括 <size> 的大小),作为向后编码的 Mobipocket 可变宽度整数。
仅识别了少数位
少量 | 记录末尾的数据 |
---|---|
0x0001 | 多字节字符重叠 |
0x0002 | 一些有助于索引的数据 |
0x0004 | 关于不可跨越断点的一些数据 |
多字节字符重叠
当额外数据标志字段的位 1 被设置时,每个记录后面跟着一个尾随条目,其中包含完成跨越记录边界的多字节字符所需的任何额外字节。无论文件使用哪种压缩方案,字节都不参与压缩。然而,与尾随数据字节不同,多字节(包括计数字节)确实包含在任何加密中。然后,重叠字节作为正常内容重新出现在下一个记录的开头。尾随条目以一个字节结束,该字节包含重叠字节的计数以及附加标志。
抵消 | 字节 | 内容 | 评论 |
---|---|---|---|
0 | 0-3 | 多字节字符的 N 个末端字节 | |
氮 | 1 | 尺寸和标志 | 位 1-2 编码 N,位 3-8 的使用未知 |
PalmDOC 压缩
PalmDOC使用LZ77压缩技术,PalmDOC 的实现可以在Github上找到。DOC 文件只能包含压缩文本。该格式不允许任何文本格式。这使得文件很小,符合 Palm 的理念。但是,格式的扩展可以使用标签(例如HTML或PML)来在文本中包含格式。PalmDoc 的这些扩展不可互换,并且是 Palm 设备上大多数电子书阅读器格式的基础。
LZ77 算法通过将部分数据替换为已通过编码器和解码器的匹配数据的引用来实现压缩。匹配由一对称为长度-距离对的数字进行编码,这相当于语句“下一个长度字符中的每个字符等于未压缩流中其后面的精确距离字符”。(“距离”有时也称为“偏移量”。)
在 PalmDoc 格式中,长度-距离对始终由两字节序列编码。在组成这两个字节的 16 位中,11 位用于编码距离,3 位用于编码长度,剩下的 2 位用于确保解码器可以将第一个字节识别为这样的两个字节的开头字节序列。解码压缩文本所需的确切算法可以在PalmDOC页面上找到。
PalmDOC 数据始终分为 4096 字节块(未压缩大小),并且这些块独立作用;当块被压缩或解压缩时,不需要来自先前或后面的块的信息。
PalmDOC 确实支持书签。这些指针被命名并引用文件中的偏移位置。如果文件被编辑,这些位置可能不再引用正确的位置。一些阅读程序允许用户输入或编辑这些书签,而另一些则将它们视为目录。有些阅读程序可能会完全忽略它们。它们存储在文件本身的末尾,因此在加载时需要扫描整个文件才能找到它们。
影像记录
如果文件包含图像,它们会跟随文本块,每个图像使用一个块。PalmDoc 标头中的 4096 字节记录大小仅适用于文本记录;图像记录可能更大。
魔幻唱片
在某些情况下,MobiPocket Creator会在文件中的文本记录之后添加 2 个零字节的记录。该记录不包含在PalmDoc标头中的文本记录的“记录计数”中,并且也不用作MOBI标头中的“第一非书籍索引”。(如果存在 2 个零字节记录,则将后续块的索引用作“第一个非书索引”。)
MobiPocket Creator还以三个记录结束文件:按顺序为“FLIS”、“FCIS”和“文件结束”。MobiPocket Reader或Amazon Kindle 2似乎不需要“FLIS”和“FCIS”记录来读取该文件。“文件结束”记录可能是必要的。
FLIS记录
FLIS 记录似乎具有固定值。这些值的含义未知。
抵消 | 字节 | 内容 | 评论 |
---|---|---|---|
0 | 4 | 标识符 | 字符 FLIS (0x46 0x4c 0x49 0x53) |
4 | 4 | ? | 固定值:8 |
8 | 2 | ? | 固定值:65 |
10 | 2 | ? | 固定值:0 |
12 | 4 | ? | 固定值:0 |
16 | 4 | ? | 固定值:-1 (0xFFFFFFFF) |
20 | 2 | ? | 固定值:1 |
22 | 2 | ? | 固定值:3 |
24 | 4 | ? | 固定值:3 |
28 | 4 | ? | 固定值:1 |
32 | 4 | ? | 固定值:-1 (0xFFFFFFFF) |
联邦调查局记录
FCIS 记录似乎大多具有固定值。
抵消 | 字节 | 内容 | 评论 |
---|---|---|---|
0 | 4 | 标识符 | 字符 FCIS (0x46 0x43 0x49 0x53) |
4 | 4 | ? | 固定值:20 |
8 | 4 | ? | 固定值:16 |
12 | 4 | ? | 固定值:1 |
16 | 4 | ? | 固定值:0 |
20 | 4 | ? | 文本长度(与 PalmDoc 标头中的“文本长度”值相同) |
24 | 4 | ? | 固定值:0 |
28 | 4 | ? | 固定值:32 |
32 | 4 | ? | 固定值:8 |
36 | 2 | ? | 固定值:1 |
38 | 2 | ? | 固定值:1 |
40 | 4 | ? | 固定值:0 |
文件结束记录
文件结束记录是固定的 4 字节记录。虽然最后两个字节似乎是 CRLF 标记,但前两个字节的含义未知。
抵消 | 字节 | 内容 | 评论 |
---|---|---|---|
0 | 1 | ? | 固定值:233 (0xe9) |
1 | 1 | ? | 固定值:142 (0x8e) |
2 | 1 | ? | 固定值:13(0x0d) |
3 | 1 | ? | 固定值:10(0x0a) |
编译记录
KindleGen在#End-of-file记录(KindleGen 版本 1.2-2.2)之前或 BOUNDARY 记录之前创建编译源 (KindleGen 1.2-2.5) 以及编译源和编译器输出 (Kindle Gen 2.7-) 的记录(KindleGen 版本 2.3-)。
使用 Mobipocket Creator、Amazon 个人文档服务或 Kindle Direct Publishing(以前的 Amazon DTP)创建的 MOBI 文件不包含 SRCS 记录。过去,kindlegen 有一个未记录的选项来抑制此记录,但该选项在 2010 年被删除。
SRCS 记录是一种记录,其内容是给定命令的所有源文件(即.opf、.ncx、.htm、.jpg 等)的 zip 存档,并将其放入生成的 MOBI 文件中。该记录以“SRCS”签名开头,如下所示:
抵消 | 字节 | 内容 | 评论 |
---|---|---|---|
0 | 4 | 标识符 | “SRCS”(0x53 0x52 0x43 0x53) |
4 | 4 | ? | 固定值(?): 0x00000010 |
8 | 4 | ? | 固定值(?):0x0000002f |
12 | 4 | ? | 固定值(?): 0x00000001 |
16 | 压缩 | zip 存档继续到该记录的末尾 |
CMET 记录是一种记录,其内容是编译操作的输出,也许还有额外的信息。该记录以“CMET”签名开头,如下所示:
抵消 | 字节 | 内容 | 评论 |
---|---|---|---|
0 | 4 | 标识符 | “CMET”(0x43 0x4D 0x45 0x54) |
4 | 4 | ? | 固定值(?): 0x0000000C |
8 | 4 | 文本长度 | (大端) |
12 | 多变的 | 文本 | 编译输出文本,行结尾为 CRLF |
多变的 | 多变的 | ? | 到记录末尾的未知数据 |
媒体记录(AUDI/VIDE)
kindlegen 支持某些 Kindle 平台的嵌入音频和视频。每个媒体文件都存储在单独的 AUDI(音频)或 VIDE(视频)记录中。
媒体记录如下所示:
抵消 | 字节 | 内容 | 评论 |
---|---|---|---|
0 | 4 | 标识符 | “奥迪”(0x41 0x55 0x44 0x49)或“视频”(0x56 0x49 0x44 0x45) |
4 | 4 | ? | 未知值 |
8 | 4 | ? | 未知值 |
12 | 媒体 | 媒体数据持续到该记录的末尾 |
MBP
这是 MOBI 格式电子书的辅助文件(辅助)上使用的扩展名。它用于存储图书馆软件使用的元数据,还用于存储用户输入的数据,如书签、注释、上次阅读位置。该文件是电子书首次打开时由阅读器程序自动创建的,扩展名为 .mbp。MobiPocket 中的图书馆管理软件使用此文件来获取图书馆窗口中显示的信息,例如标题、作者和描述,这样就不必打开较大的电子书文件。
人们正在努力描述二进制 MBP 文件格式(请参阅此站点)。还有一个mbp 阅读器程序,可以从 mbp 文件中提取注释
电子书创作
有多种方法可以创建 MOBI 格式的电子书。MobiPocket 网站上的文档详细说明了在 MOBI 中创建电子书所需的源文件格式规则。名为MobiPocket Creator的推荐工具可从网站下载。
电子书还可以使用 Windows 版本的MobiPocket Reader从其他形式转换。转换后,该文件可以在MobiPocket Reader支持的任何设备上使用。
指南
为了更好地支持MobiPocket Reader的功能,在创建这种格式的图书时需要遵循一些准则。
- 不要指定默认字体系列、字体大小或其他字体属性(例如粗细或颜色)。这是阅读电子书的人应该能够做出的选择。可以为特殊标题和其他特定项目指定字体大小和属性。仅使用通用字体系列。
- 不要强加标准文本的理由。标题和其他特殊文本可能需要它。
- 除了表数据之外,不要将表用于任何其他用途。不支持嵌套表。
- 不要使用空行来尝试强制页面更改。使用 <mbp:pagebreak/> 标签。
- 不要将多本书用于不同的设备。相反,请使用多分辨率图像和平台特定帧等高级功能。
使图像适应各种 PDA 屏幕分辨率
请注意,以下部分仅适用于原始 mobi 格式,Amazon AZW文件不使用。
Mobipocket 出版物中的 IMG 标签最多支持三种不同分辨率的源属性:src、losrc 和 hisrc。这使得针对不同设备优化同一本电子书成为可能。显示的图像由Reader根据实际设备屏幕的分辨率动态选择:
属性 | 屏幕最小尺寸 | 示例设备 |
---|---|---|
洛斯克 | <= 239 像素 | 低分辨率 160x160 Palm 设备(PalmVx、Treo 600、Zire) 智能手机(Nokia 3650、Sony Ericson P800/900、Microsoft 智能手机) |
源代码 | >= 240 像素(手持设备) | Pocket PC、Hi rez Palm 设备(Sony Clie、Tungsten、Zire 71) |
历史记录 | >= 480 像素 | 任何台式电脑或平板电脑 |
例子:
<img hisrc="cover480x640.gif" src="cover220x300.gif" losrc="cover140x140.gif"/>
另请注意,图像有 63KB 的内部限制(这是 Mobipocket .PRC 格式的限制)。GIF 必须小于 63KB。您可以使用 GIF 优化程序(例如 Ulead Smart Saver)来获得小于 63KB 的 GIF。(如果图像大于 63KB,它们会自动调整大小以适应 MobiGEN 的限制,但您可能不喜欢结果)。Jpeg 图像将使用较低的质量设置来缩小图像尺寸,而不会减小像素大小。
MOBI 创建的 HTML 和 CSS 提示
- Kindle 和 Kindle DX 无法正确处理软连字符 HTML 实体。请改用<shy/>标签。
- 灰色文本在某些设备上显示为白色。为了避免这个问题,添加:
@media amzn-mobi 和(单色){.mygreytextclass {
颜色:黑色;
}
}
- kindlegen 工具忽略padding-left。如有必要,您可以通过在该元素内添加一个元素并设置其左边距来解决此问题。
- kindlegen 工具的 CSS 解析器有时会出现错误。因此,如果您有:
div.foo p {
...
}
kindlegen 工具经常错误地将其解释为:div.foo, div.foo p {
...
}
如果您只想将样式应用于内部标记,则必须将自定义类添加到外部标记内的段落并单独使用该 CSS 选择器。
有关为支持 KF8 的阅读器开发内容的其他提示,请参阅KF8 CSS 提示。
格式限制
MOBI 格式有很多限制。这里列出了一些。
- 文本块的右侧边距永远不能大于正常边距。
- 左边距只能以 1em 增量指定。如果文本没有左边距,则只能有悬挂缩进。最近的 Kindle 渲染器已将左边距增量增加到大约 2em 增量。
- 文本不能在高度超过一行文本的图像周围流动。
- 图像大小不能随字体大小缩放。
- 在某些(但不是全部)Mobipocket 渲染器中,带有左边距的文本会根据前一个换行发生时的字体大小来更改每行的边距值。
- 许多度量值(例如悬挂缩进的缩进)无法以 em 形式指定。
- 单个文本项无法以等宽字体显示。
- 表格在不同的 Mobipocket 渲染器上的显示差异很大,尤其是跨多个屏幕的表格。
- 根本不支持嵌套表。
- 此外,如果您编写了标记来使用 Mobipocket 的非标准、扩展和文档不足的 HTML 3.2 实现,那么您只能获得 Mobipocket 的全部格式化功能。请参阅:mobipocket 网站上的 文件标签参考。
移动数字版权管理
Mobi DRM可以选择应用于此文件格式。Mobipocket 和Overdrive服务器支持标准方案。这是基于从读取设备或程序派生的 ID。购买电子书时,服务器必须知道此 PID,并将其嵌入到文件中并锁定到设备。许可方案确实允许支持多个设备(通常最多 4 个)。在这种情况下,服务器需要知道所有设备的设备ID。如果您添加设备,则必须告诉服务器并重新下载电子书才能在新设备上阅读。通常添加设备或重新下载电子书是免费的。如果经销商倒闭,您可能无法添加设备,因为无法重新下载文件。
第二种更简单的方案,只需要知道用于购买电子书的帐户登录名和密码。输入此数据后,即可阅读电子书。每个设备只需输入一次此数据。这是一个新方案,有些读者可能不支持这种方法。
某些电子书上使用的第三种方法是使用通用 MOBI 密钥。它具有加密功能,但仅使用通用 MOBI 密钥(而不是特定于 PID 的密钥)。这意味着可以由任何设备上的任何 MobiPocket 阅读器软件读取,但不能由任何非 MobiPocket 软件读取。
DRM 仅适用于电子书本身,不适用于元数据。图书馆例程可以读取元数据,而无需解锁电子书。有些程序甚至可以在不触及文件 DRM 部分的情况下更改此信息。
MOBI 电子书阅读器和转换器
除了MobiPocket 提供的读卡器外,还有第 3 方读卡器和转换器。这包括:
- 口径
- 节
- FB阅读器
- 书籍设计师
- 图书媒体
- STDU 查看器
- 苏门答腊 PDF
- MBP_reader(可以将MBP注释提取到文本文件的程序)。
- 适用于 PC或Mac的 Kindle
- EPUB 到 Kindle 转换器
- PDF/ePUB 转 Kindle 工具
- Kindle图书开发工具
- KindleUnpack - 以前称为 MobiUnpack - KindleUnpack 会将 mobi 文件分解为其原始形式。也称为移动解码器。
- PDF 到 ePUB/Mobi 转换器
- KindleGen - 将 ePub 转换为 Mobi ( AZW ) 或以其他方式生成 Mobi 格式的官方 Amazon 工具。
MOBI 电子书硬件阅读器
- 布克恩 Cybook Gen3
- Bookeen Cybook 作品
- 翰林V3 / Bebook / EZ Reader
- 雷克斯·伊利亚德
- iRex 数字阅读器
- 亚马逊 Kindle 阅读器
- Onyx BOOX 阅读器
并非所有支持 Mobi 格式的电子书阅读器都具有相同的功能。检查Mobi Comparison了解实际支持的详细信息。
从 ePub 文件创建 MOBI 文件
以下是从ePub文件创建 mobi 文件的一种方法。
- 确保您只将标题 h1-h2-h3 用于您想要的 TOC 条目(稍后会说明原因,或使用下面的 10);
- 在Sigil中制作整个 ePub ,使用“添加现有项目”选项 导入HTML文件。
- 完成您的 ePUB,不要添加封面页。
- 下载 MobiPocket Creator 并安装。
- 解压缩您的 ePUB。
- 双击OPF。
- 这本书将在 MobiPocket Creator 中在您面前打开。
- 将您的封面拖放到 MBP Creator 中。
- 使用 MBP Creator 制作仅包含标题 1-2-3 的 html.TOC,或者,
- 或者:通过编辑“指南属性”部分将 MPCreator 指向现有的 html.toc。(注意 - toc.ncx 将已经位于 MBPCreator 目录的“My Publications”文件夹中的相应文件夹中)
- 单击“构建”。
- 您有一个功能齐全的 PRC 文件。
了解更多信息
- Mobipocket Creator - 免费下载,另请参阅MobiPocket Creator
- Mobipocket开发中心-创建文档
- 内容生成- 尽管 MOBI 中没有 CSS,但请参阅类似 CSS 功能的段落格式。
- Amazon KindleGen - mobiGen 的升级版,但适用于 mobi 书籍。
- KindleGen - 我们关于使用 KindleGen 的 wiki 页面。
- MobileRead 论坛- Mobi 解压,拆开 mobi 文件。
- Java Mobi 元数据编辑器- 编辑、添加和删除 mobi 文件中的 EXTH 标签。
原文
MOBI is the name given to the format developed for the MobiPocket Reader. It is currently used by Amazon with a slightly different DRM scheme and called AZW. Amazon uses this extension for files created by KindleGen even though they actually have both a MOBI format, sometimes called KF7, and a KF8 format inside the same file.
Contents[hide]
|
[edit]Overview
MOBI is the format used by the MobiPocket Reader and Amazon Kindle Readers. It may have a .mobi extension or it may have a .prc extension. The extension can be changed by the user to either of the accepted forms. In either case it may be DRM protected or non-DRM. The .prc extension is used because the PalmOS doesn't support any file extensions except .prc or .pdb. Note that Mobipocket prohibits their DRM format to be used on dedicated eBook readers that support other DRM formats. Mobi source files are based on the OEB, Open eBook standard.
[edit]Description
MOBI format was originally an extension of the PalmDOC format by adding certain HTML like tags to the data (See EBook HTML). Many MOBI formatted documents still use this form. However there is also a high compression version of this file format that compresses data to a larger degree in a proprietary manner. There are some third party programs that can read the eBooks in the original MOBI format but there are only a few third party programs that can read the eBooks in the new compressed form. The higher compression mode is using a Huffman coding scheme that has been called the Huff/cdic algorithm. For a description in Python check huffcdic.py available as part of the Calibre project.
From time to time features have been added to the format so new files may have problems if you try to read them with a down level reader. Currently the source files follow the guidelines in the Open eBook format.
Note that AZW for the Amazon Kindle is the same format as MOBI except that it uses a different DRM scheme. Amazon owns MobiPocket. The format description below applies to both file types.
[edit]Format
Like PalmDOC, the Mobipocket file format is that of a standard Palm Database Format file. The header of that format includes the name of the database (usually the book title and sometimes a portion of the authors name) which is up to 31 bytes of data. The files are identified as Creator ID of MOBI and a Type of BOOK.
Mobipocket have some minimal file format info, mainly about the HTML encoding they use in the text of the book, at http://www.mobipocket.com/dev/article.asp?BaseFolder=prcgen (replaced with archive copy). Also see EBook HTML for Mobi7 version of HTML.
[edit]PalmDOC Header
The first record in the Palm Database Format gives more information about the Mobipocket file. The first 16 bytes are almost identical to the first sixteen bytes of a PalmDOC format file.
offset | bytes | content | comments |
---|---|---|---|
0 | 2 | Compression | 1 == no compression, 2 = PalmDOC compression, 17480 = HUFF/CDIC compression |
2 | 2 | Unused | Always zero |
4 | 4 | text length | Uncompressed length of the entire text of the book |
8 | 2 | record count | Number of PDB records used for the text of the book. |
10 | 2 | record size | Maximum size of each record containing text, always 4096 |
12 | 4 | Current Position | Current reading position, as an offset into the uncompressed text |
There are two differences from a Palm DOC file. There's an additional compression type (17480), and the Current Position bytes are used for a different purpose:
offset | bytes | content | comments |
---|---|---|---|
12 | 2 | Encryption Type | 0 == no encryption, 1 = Old Mobipocket Encryption, 2 = Mobipocket Encryption |
14 | 2 | Unknown | Usually zero |
The old Mobipocket Encryption scheme only allows the file to be registered with one PID, unlike the current encryption scheme that allows multiple PIDs to be used in a single file. Unless specifically mentioned, all the encryption information on this page refers to the current scheme.
[edit]MOBI Header
Most Mobipocket file also have a MOBI header in record 0 that follows these 16 bytes, and newer formats also have an EXTH header following the MOBI header, again all in record 0 of the PDB file format.
The MOBI header is of variable length and is not documented. Some fields have been tentatively identified as follows:
offset | hex | bytes | content | comments |
---|---|---|---|---|
16 | 0x10 | 4 | identifier | the characters M O B I |
20 | 0x14 | 4 | header length | the length of the MOBI header, including the previous 4 bytes |
24 | 0x18 | 4 | Mobi type | The kind of Mobipocket file this is 2 Mobipocket Book 3 PalmDoc Book 4 Audio 232 mobipocket? generated by kindlegen1.2 248 KF8: generated by kindlegen2 257 News 258 News_Feed 259 News_Magazine 513 PICS 514 WORD 515 XLS 516 PPT 517 TEXT 518 HTML |
28 | 0x1c | 4 | text Encoding | 1252 = CP1252 (WinLatin1); 65001 = UTF-8 |
32 | 0x20 | 4 | Unique-ID | Some kind of unique ID number (random?) |
36 | 0x24 | 4 | File version | Version of the Mobipocket format used in this file. |
40 | 0x28 | 4 | Ortographic index | Section number of orthographic meta index. 0xFFFFFFFF if index is not available. |
44 | 0x2c | 4 | Inflection index | Section number of inflection meta index. 0xFFFFFFFF if index is not available. |
48 | 0x30 | 4 | Index names | 0xFFFFFFFF if index is not available. |
52 | 0x34 | 4 | Index keys | 0xFFFFFFFF if index is not available. |
56 | 0x38 | 4 | Extra index 0 | Section number of extra 0 meta index. 0xFFFFFFFF if index is not available. |
60 | 0x3c | 4 | Extra index 1 | Section number of extra 1 meta index. 0xFFFFFFFF if index is not available. |
64 | 0x40 | 4 | Extra index 2 | Section number of extra 2 meta index. 0xFFFFFFFF if index is not available. |
68 | 0x44 | 4 | Extra index 3 | Section number of extra 3 meta index. 0xFFFFFFFF if index is not available. |
72 | 0x48 | 4 | Extra index 4 | Section number of extra 4 meta index. 0xFFFFFFFF if index is not available. |
76 | 0x4c | 4 | Extra index 5 | Section number of extra 5 meta index. 0xFFFFFFFF if index is not available. |
80 | 0x50 | 4 | First Non-book index? | First record number (starting with 0) that's not the book's text |
84 | 0x54 | 4 | Full Name Offset | Offset in record 0 (not from start of file) of the full name of the book |
88 | 0x58 | 4 | Full Name Length | Length in bytes of the full name of the book |
92 | 0x5c | 4 | Locale | Book locale code. Low byte is main language 09= English, next byte is dialect, 08 = British, 04 = US. Thus US English is 1033, UK English is 2057. |
96 | 0x60 | 4 | Input Language | Input language for a dictionary |
100 | 0x64 | 4 | Output Language | Output language for a dictionary |
104 | 0x68 | 4 | Min version | Minimum mobipocket version support needed to read this file. |
108 | 0x6c | 4 | First Image index | First record number (starting with 0) that contains an image. Image records should be sequential. |
112 | 0x70 | 4 | Huffman Record Offset | The record number of the first huffman compression record. |
116 | 0x74 | 4 | Huffman Record Count | The number of huffman compression records. |
120 | 0x78 | 4 | Huffman Table Offset | |
124 | 0x7c | 4 | Huffman Table Length | |
128 | 0x80 | 4 | EXTH flags | bitfield. if bit 6 (0x40) is set, then there's an EXTH record |
132 | 0x84 | 32 | ? | 32 unknown bytes, if MOBI is long enough |
164 | 0xa4 | 4 | Unknown | Use 0xFFFFFFFF |
168 | 0xa8 | 4 | DRM Offset | Offset to DRM key info in DRMed files. 0xFFFFFFFF if no DRM |
172 | 0xac | 4 | DRM Count | Number of entries in DRM info. 0xFFFFFFFF if no DRM |
176 | 0xb0 | 4 | DRM Size | Number of bytes in DRM info. |
180 | 0xb4 | 4 | DRM Flags | Some flags concerning the DRM info. |
184 | 0xb8 | 8 | Unknown | Bytes to the end of the MOBI header, including the following if the header length >= 228 (244 from start of record). Use 0x0000000000000000. |
192 | 0xc0 | 2 | First content record number | Number of first text record. Normally 1. |
194 | 0xc2 | 2 | Last content record number | Number of last image record or number of last text record if it contains no images. Includes Image, DATP, HUFF, DRM. |
196 | 0xc4 | 4 | Unknown | Use 0x00000001. |
200 | 0xc8 | 4 | FCIS record number | |
204 | 0xcc | 4 | Unknown (FCIS record count?) | Use 0x00000001. |
208 | 0xd0 | 4 | FLIS record number | |
212 | 0xd4 | 4 | Unknown (FLIS record count?) | Use 0x00000001. |
216 | 0xd8 | 8 | Unknown | Use 0x0000000000000000. |
224 | 0xe0 | 4 | Unknown | Use 0xFFFFFFFF. |
228 | 0xe4 | 4 | First Compilation data section count | Use 0x00000000. |
232 | 0xe8 | 4 | Number of Compilation data sections | Use 0xFFFFFFFF. |
236 | 0xec | 4 | Unknown | Use 0xFFFFFFFF. |
240 | 0xf0 | 4 | Extra Record Data Flags | A set of binary flags, some of which indicate extra data at the end of each text block. This only seems to be valid for Mobipocket format version 5 and 6 (and higher?), when the header length is 228 (0xE4) or 232 (0xE8).
Setting bit 2 (0x2) disables <guide><reference type="start"> functionality. |
244 | 0xf4 | 4 | INDX Record Offset | (If not 0xFFFFFFFF)The record number of the first INDX record created from an ncx file. |
248 | 0xf8 | 4 | Unknown | 0xFFFFFFFF In new MOBI file, the MOBI header length is 256, skip this to EXTH header. |
252 | 0xfb | 4 | Unknown | 0xFFFFFFFF In new MOBI file, the MOBI header length is 256, skip this to EXTH header. |
256 | 0x100 | 4 | Unknown | 0xFFFFFFFF In new MOBI file, the MOBI header length is 256, skip this to EXTH header. |
260 | 0x104 | 4 | Unknown | 0xFFFFFFFF In new MOBI file, the MOBI header length is 256, skip this to EXTH header. |
264 | 0x108 | 4 | Unknown | 0xFFFFFFFF In new MOBI file, the MOBI header length is 256, skip this to EXTH header. |
268 | 0x10b | 4 | Unknown | 0 In new MOBI file, the MOBI header length is 256, skip this to EXTH header, MOBI Header length 256, and add 12 bytes from PalmDOC Header so this index is 268. |
[edit]EXTH Header
If the MOBI header indicates that there's an EXTH header, it follows immediately after the MOBI header. Since the MOBI header is of variable length, this isn't at any fixed offset in record 0. Note that some readers will ignore any EXTH header info if the mobipocket version number specified in the MOBI header is 2 or less (perhaps 3 or less).
The EXTH header is also undocumented, so some of this is guesswork.
bytes | content | comments |
---|---|---|
4 | identifier | the characters E X T H |
4 | header length | the length of the EXTH header, including the previous 4 bytes - but not including the final padding. |
4 | record Count | The number of records in the EXTH header. the rest of the EXTH header consists of repeated EXTH records to the end of the EXTH length. |
EXTH record start | Repeat until done. | |
4 | record type | Exth Record type. Just a number identifying what's stored in the record |
4 | record length | length of EXTH record = L , including the 8 bytes in the type and length fields |
L-8 | record data | Data. |
EXTH record end | Repeat until done. | |
p | padding | Null bytes to pad the EXTH header to a multiple of four bytes (none if the header is already a multiple of four). This padding is not included in the EXTH header length. |
There are lots of different EXTH Records types. Ones found so far in Mobipocket files are listed here, with possible meanings. Hopefully the table will be filled in as more information comes to light.
record type | usual length | name | comments | opf meta tag |
---|---|---|---|---|
1 | drm_server_id | |||
2 | drm_commerce_id | |||
3 | drm_ebookbase_book_id | |||
100 | author | <dc:Creator> | ||
101 | publisher | <dc:Publisher> | ||
102 | imprint | <Imprint> | ||
103 | description | <dc:Description> | ||
104 | isbn | <dc:Identifier scheme='ISBN'> | ||
105 | subject | Could appear multiple times | <dc:Subject> | |
106 | publishingdate | <dc:Date> | ||
107 | review | <Review> | ||
108 | contributor | <dc:Contributor> | ||
109 | rights | <dc:Rights> | ||
110 | subjectcode | <dc:Subject BASICCode="subjectcode"> | ||
111 | type | <dc:Type> | ||
112 | source | <dc:Source> | ||
113 | asin | Kindle Paperwhite labels books with "Personal" if they don't have this record. | ||
114 | versionnumber | |||
115 | 4 | sample | 0x0001 if the book content is only a sample of the full book | |
116 | startreading | Position (4-byte offset) in file at which to open when first opened | ||
117 | 3 | adult | Mobipocket Creator adds this if Adult only is checked on its GUI; contents: "yes" | <Adult> |
118 | retail price | As text, e.g. "4.99" | <SRP> | |
119 | retail price currency | As text, e.g. "USD" | <SRP Currency="currency"> | |
121 | 4 | KF8 BOUNDARY Offset | ||
122 | fixed-layout | "true" | ||
123 | book-type | "comic" | ||
124 | orientation-lock | "none", "portrait", "landscape" | ||
125 | 4 | count of resources | ||
126 | original-resolution | "1072x1448" | ||
127 | zero-gutter | "true" | ||
128 | zero-margin | "true" | ||
129 | Metadata Resource URI | |||
131 | 4 | Unknown | ||
132 | Unknown | "true" | ||
200 | 3 | Dictionary short name | As text | <DictionaryVeryShortName> |
201 | 4 | coveroffset | Add to first image field in Mobi Header to find PDB record containing the cover image | <EmbeddedCover> |
202 | 4 | thumboffset | Add to first image field in Mobi Header to find PDB record containing the thumbnail cover image | |
203 | hasfakecover | |||
204 | 4 | Creator Software | Known Values: 1=mobigen, 2=Mobipocket Creator, 200=kindlegen (Windows), 201=kindlegen (Linux), 202=kindlegen (Mac). Warning: Calibre creates fake creator entries, pretending to be a Linux kindlegen 1.2 (201, 1, 2, 33307) for normal ebooks and a non-public Linux kindlegen 2.0 (201, 2, 0, 101) for periodicals. | |
205 | 4 | Creator Major Version | ||
206 | 4 | Creator Minor Version | ||
207 | 4 | Creator Build Number | ||
208 | watermark | |||
209 | tamper proof keys | Used by the Kindle (and Android app) for generating book-specific PIDs. | ||
300 | fontsignature | |||
401 | 1 | clippinglimit | Integer percentage of the text allowed to be clipped. Usually 10. | |
402 | publisherlimit | |||
403 | Unknown | |||
404 | 1 | ttsflag | 1 - Text to Speech disabled; 0 - Text to Speech enabled | |
405 | 1 | Unknown (Rent/Borrow flag?) | 1 in this field seems to indicate a rental book | |
406 | 8 | Rent/Borrow Expiration Date | If this field is removed from a rental, the book says it expired in 1969 | |
407 | 8 | Unknown | ||
450 | 4 | Unknown | ||
451 | 4 | Unknown | ||
452 | 4 | Unknown | ||
453 | 4 | Unknown | ||
501 | 4 | cdetype | PDOC - Personal Doc; EBOK - ebook; EBSP - ebook sample; | |
502 | lastupdatetime | |||
503 | updatedtitle | |||
504 | asin | I found a copy of ASIN in this record. | ||
524 | language | <dc:language> | ||
525 | writingmode | I found horizontal-lr in this record. | ||
535 | Creator Build Number | I found 1019-d6e4792 in this record, which is a build number of Kindlegen 2.7 | ||
536 | Unknown | |||
542 | 4 | Unknown | Some Unix timestamp. | |
547 | InMemory | String 'I\x00n\x00M\x00e\x00m\x00o\x00r\x00y\x00' found in this record, for KindleGen V2.9 build 1029-0897292 |
[edit]Remainder of Record 0
At the end of Record 0 of the PDB file format, we usually get the full file name, the offset of which is given in the MOBI header.
There might be data of unknown use between the end of the EXTH records and the name.
The name is followed by two null bytes, and then padded with null bytes to a four-byte boundary. For example, if the name is 16 bytes long, with two null bytes, that makes 18 bytes, and it then gets another two null bytes added to make it up to 20 bytes in total. However, the length stored in the header is only 16. If the name was 19 bytes, it would be followed by two null bytes to make it up to 21 bytes, and then padded with three more null bytes to make it up to 24 bytes.
The name and padding is followed by more data of unknown use, usually null bytes, to the end of section 0.
[edit]Index meta record
The first record of an index contains the meta data of the index.
offset | hex | bytes | content | comments |
---|---|---|---|---|
0 | 0x00 | 4 | Identifier | the characters I N D X |
4 | 0x04 | 4 | header length | the length of the INDX header, including the previous 4 bytes |
8 | 0x08 | 4 | index type | the type of the index. Known values: 0 - normal index, 2 - inflections |
12 | 0x0c | 4 | ? | ? |
16 | 0x10 | 4 | ? | ? |
20 | 0x14 | 4 | idxt start | the offset to the IDXT section |
24 | 0x18 | 4 | index count | the number of index records |
28 | 0x1c | 4 | index encoding | 1252 = CP1252 (WinLatin1); 65001 = UTF-8 |
32 | 0x20 | 4 | index language | the language code of the index |
36 | 0x24 | 4 | total index count | the number of index entries |
40 | 0x28 | 4 | ordt start | the offset to the ORDT section |
44 | 0x2c | 4 | ligt start | the offset to the LIGT section |
48 | 0x30 | 4 | ? | ? |
52 | 0x34 | 4 | ? | ? |
The remaining INDX header values are unknown.
[edit]TAGX section
The TAGX section follows the INDX header and is essential for decoding the index values, as it defines which how many control bytes an entry contains, which bits correspond to which tag and how many values a tag requires (most tag need one value, but some have two, maybe more).
offset | hex | bytes | content | comments |
---|---|---|---|---|
0 | 0x00 | 4 | Identifier | the characters T A G X |
4 | 0x04 | 4 | header length | the length of the TAGX header, including the previous 4 bytes |
8 | 0x08 | 4 | control byte count | the number of control bytes |
12 | 0x0c | n | tag table | the tag table entries (n = header length - 12, must be multiple of 4 bytes) |
The tag table entries are multiple of 4 bytes. The first byte is the tag, the second byte the number of values, the third byte the bit mask and the fourth byte indicates the end of the control byte. If the fourth byte is 0x01, all other bytes of the entry are zero.
[edit]Variable-width integers
Some parts of the Mobipocket format encode data as variable-width integers. These integers are represented big-endian with 7 bits per byte in bits 1-7. They may be either forward-encoded, in which case only the LSB has bit 8 set, or backward-encoded, in which case only the MSB has bit 8 set. For example, the number 0x11111 would be represented forward-encoded as:
0x04 0x22 0x91
And backward-encoded as:
0x84 0x22 0x11
[edit]Trailing entries
The Extra Data Flags field of the MOBI header indicates which, if any, trailing entries are appended to the end of each text record. Each set bit in the field indicates a trailing entry. The entries appear to occur in bit-order; e.g., trailing entry 1 immediately follows the text content and entry 16 occurs at the very end of the record. The effect and exact details of most of these entries is unknown. The trailing entries indicated by bits 2-16 appear to follow a common format. That format is:
<data><size>
Where <size> is the size of the entire trailing entry (including the size of <size>) as a backward-encoded Mobipocket variable-width integer.
Only a few bits have been identified
bit | Data at end of records |
---|---|
0x0001 | Multi-byte character overlaps |
0x0002 | Some data to help with indexing |
0x0004 | Some data about uncrossable breaks |
[edit]Multibyte character overlap
When bit 1 of the Extra Data Flags field is set, each record is followed by a trailing entry containing any extra bytes necessary to complete a multibyte character which crosses the record boundary. The bytes do not participate in compression regardless which compression scheme is used for the file. However, unlike the trailing data bytes, the multibytes (including the count byte) do get included in any encryption. The overlapping bytes then re-appear as normal content at the beginning of the following record. The trailing entry ends with a byte containing a count of the overlapping bytes plus additional flags.
offset | bytes | content | comments |
---|---|---|---|
0 | 0-3 | N terminal bytes of a multibyte character | |
N | 1 | Size & flags | bits 1-2 encode N, use of bits 3-8 is unknown |
[edit]PalmDOC Compression
PalmDOC uses LZ77 compression techniques, an implementation for PalmDOC can be found at Github . DOC files can contain only compressed text. The format does not allow for any text formatting. This keeps files small, in keeping with the Palm philosophy. However, extensions to the format can use tags, such as HTML or PML, to include formatting within text. These extensions to PalmDoc are not interchangeable and are the basis for most eBook Reader formats on Palm devices.
LZ77 algorithms achieve compression by replacing portions of the data with references to matching data that has already passed through both encoder and decoder. A match is encoded by a pair of numbers called a length-distance pair, which is equivalent to the statement "each of the next length characters is equal to the character exactly distance characters behind it in the uncompressed stream." (The "distance" is sometimes called the "offset" instead.)
In the PalmDoc format, a length-distance pair is always encoded by a two-byte sequence. Of the 16 bits that make up these two bytes, 11 bits go to encoding the distance, 3 go to encoding the length, and the remaining two are used to make sure the decoder can identify the first byte as the beginning of such a two-byte sequence. The exact algorithm needed to decode the compressed text can be found on the PalmDOC page.
PalmDOC data is always divided into 4096 byte blocks (uncompressed size) and the blocks are acted upon independently; no information from previous or later blocks is needed when a block is being compressed or decompressed.
PalmDOC does have support for bookmarks. These pointers are named and refer to an offset location in a file. If the file is edited these locations may no longer refer to the correct locations. Some reading programs allow the user to enter or edit these bookmarks while others treat them as a TOC. Some reading programs may ignore them entirely. They are stored at the end of the file itself so the full file needs to be scanned when loaded to find them.
[edit]Image Records
If the file contains images, they follow the text blocks, with each image using a single block. The 4096-byte record size in the PalmDoc header applies only to text records; image records may be larger.
[edit]Magic Records
In some cases, MobiPocket Creator adds a 2-zero-byte record after the text records in a file. This record is not included in the "record count" of text records in the PalmDoc header, and is also not used as the "first non-book index" in the MOBI header. (If the 2-zero-byte record is present, the index of the following block is used as the "first non-book index".)
MobiPocket Creator also ends files with three records: 'FLIS', 'FCIS', and 'end-of-file', in that order. The 'FLIS' and 'FCIS' records do not seem to be necessary for MobiPocket Reader or the Amazon Kindle 2 to read the file. The 'end-of-file' record might be necessary.
[edit]FLIS Record
The FLIS record appears to have a fixed value. The meaning of the values is not known.
offset | bytes | content | comments |
---|---|---|---|
0 | 4 | identifier | the characters F L I S (0x46 0x4c 0x49 0x53) |
4 | 4 | ? | fixed value: 8 |
8 | 2 | ? | fixed value: 65 |
10 | 2 | ? | fixed value: 0 |
12 | 4 | ? | fixed value: 0 |
16 | 4 | ? | fixed value: -1 (0xFFFFFFFF) |
20 | 2 | ? | fixed value: 1 |
22 | 2 | ? | fixed value: 3 |
24 | 4 | ? | fixed value: 3 |
28 | 4 | ? | fixed value: 1 |
32 | 4 | ? | fixed value: -1 (0xFFFFFFFF) |
[edit]FCIS Record
The FCIS record appears to have mostly fixed values.
offset | bytes | content | comments |
---|---|---|---|
0 | 4 | identifier | the characters F C I S (0x46 0x43 0x49 0x53) |
4 | 4 | ? | fixed value: 20 |
8 | 4 | ? | fixed value: 16 |
12 | 4 | ? | fixed value: 1 |
16 | 4 | ? | fixed value: 0 |
20 | 4 | ? | text length (the same value as "text length" in the PalmDoc header) |
24 | 4 | ? | fixed value: 0 |
28 | 4 | ? | fixed value: 32 |
32 | 4 | ? | fixed value: 8 |
36 | 2 | ? | fixed value: 1 |
38 | 2 | ? | fixed value: 1 |
40 | 4 | ? | fixed value: 0 |
[edit]End-of-file Record
The end-of-file record is a fixed 4-byte record. While the last two bytes appear to be a CRLF marker, the meaning of the first two bytes is unknown.
offset | bytes | content | comments |
---|---|---|---|
0 | 1 | ? | fixed value: 233 (0xe9) |
1 | 1 | ? | fixed value: 142 (0x8e) |
2 | 1 | ? | fixed value: 13 (0x0d) |
3 | 1 | ? | fixed value: 10 (0x0a) |
[edit]Compilation Records
KindleGen creates records of the compilation source (KindleGen 1.2-2.5) and the compilation source and compiler output (Kindle Gen 2.7-) just before the #End-of-file Record (KindleGen version 1.2-2.2), or just before the BOUNDARY record (KindleGen version 2.3-).
MOBI files created with Mobipocket creator, Amazon's Personal Document Service, or Kindle Direct Publishing (former Amazon DTP) don't include SRCS record. In a past, kindlegen had an undocumented option to suppress this record, but the option was removed in 2010.
A SRCS record is a record whose content is a zip archive of all source files (i.e., .opf, .ncx, .htm, .jpg, ...) given to the command and puts it in the generated MOBI file. The record begins with the "SRCS" signature and looks as follows:
offset | bytes | content | comments |
---|---|---|---|
0 | 4 | identifier | "SRCS" (0x53 0x52 0x43 0x53) |
4 | 4 | ? | fixed value(?): 0x00000010 |
8 | 4 | ? | fixed value(?): 0x0000002f |
12 | 4 | ? | fixed value(?): 0x00000001 |
16 | zip | The zip archive continues to the end of this record |
A CMET record is a record whose content is the output of the compilation operation, and perhaps extra info. The record begins with the "CMET" signature and looks as follows:
offset | bytes | content | comments |
---|---|---|---|
0 | 4 | identifier | "CMET" (0x43 0x4D 0x45 0x54) |
4 | 4 | ? | fixed value(?): 0x0000000C |
8 | 4 | text length | (big endian) |
12 | variable | text | compilation output text, line endings are CRLF |
variable | variable | ? | unknown data to the end of the record |
[edit]Media Records (AUDI/VIDE)
kindlegen supports embedded audio and video for some Kindle platforms. Each media file is stored in a separate AUDI (audio) or VIDE (video) record.
A media record looks as follows:
offset | bytes | content | comments |
---|---|---|---|
0 | 4 | identifier | "AUDI" (0x41 0x55 0x44 0x49) or "VIDE" (0x56 0x49 0x44 0x45) |
4 | 4 | ? | unkown value |
8 | 4 | ? | unknown value |
12 | media | The media data continues to the end of this record |
[edit]MBP
This is the extension used on a side file (auxiliary) for MOBI formatted eBooks. It is used to store metadata used by the library software and also to store user entered data like bookmarks, annotations, last read position. This file is created automatically by the reader program when the eBook is first opened and has a .mbp extension. The Library management software in MobiPocket uses this file to get information displayed in the library window such as title, author, and description so that it won't have to open the larger eBook file.
There is an ongoing effort to describe the binary MBP file format (see this site). There is also a mbp reader program that will extract notes from a mbp file
[edit]eBook Creation
There are several ways to create eBooks in the MOBI format. The rules for the format of the source files need to create eBooks in MOBI are spelled out in documents on the MobiPocket web site. The recommended tool called MobiPocket Creator is available as a download from the web site.
EBooks can also be converted from other forms using the Windows version of the MobiPocket Reader. Once converted the file can be used on any device supported by MobiPocket Reader.
[edit]Guidelines
In order to better support the features of the MobiPocket Reader there are some guidelines that need to be followed when creating a book in this format.
- Do not specify a default font family, font size or other font attributes such as weight or color. This is a choice the person reading the eBook should be able to make. Fonts Sizes and Attributes can be specified for special headings and other specific items. Use only generic font families.
- Do not impose justification for standard text. It may be needed for captions and other special text.
- Do not use tables for anything except table data. Nested tables are not supported.
- Do not use blank lines to try and force page changes. Use the <mbp:pagebreak/> tag.
- Do not use multiple books for different devices. Instead use advanced features such as multi resolution images and platform specific frames.
[edit]Adapting images to various PDA screen resolutions
Note that the following section only applies to the original mobi format and is not used by Amazon AZW files.
The IMG tag in Mobipocket publications supports up to three source attributes for various resolutions: src, losrc and hisrc. This makes it possible to optimize the same ebook for various devices. The image to be displayed is dynamically selected by the Reader according to the resolution of the screen on the actual device:
Attribute | screen smallest size | example devices |
---|---|---|
losrc | <= 239 pixels | Low rez 160x160 Palm devices (PalmVx, Treo 600, Zire) Smartphones (Nokia 3650, Sony Ericson P800/900, Microsoft smartphones) |
src | >= 240 pixels (handhelds) | Pocket PC, Hi rez Palm devices (Sony Clie, Tungsten, Zire 71) |
hisrc | >= 480 pixels | any desktop or tablet PC |
Example:
<img hisrc="cover480x640.gif" src="cover220x300.gif" losrc="cover140x140.gif"/>
Please also notice that there is a 63KB internal limitation for images (this is a restriction of the Mobipocket .PRC format). GIFs have to be smaller than 63KB. You can use GIF optimization programs such as Ulead Smart Saver to get GIFs smaller than 63KB. (If images are bigger than 63KB, they are automatically resized to fit in the limit by MobiGEN but you might not like the result). Jpeg images will use a lower Quality setting to get the image size down without reducing the pixel size.
[edit]HTML and CSS Tips for MOBI creation
- Kindle and Kindle DX do not handle the soft hyphen HTML entity correctly. Use the <shy/> tag instead.
- Grey text is displayed as white on some devices. To avoid this problem, add:
@media amzn-mobi and (monochrome) {.mygreytextclass {
color: black;
}
}
- The kindlegen tool ignores padding-left. If necessary, you can work around this by adding an element inside that element and setting its left margin.
- The kindlegen tool's CSS parser is sometimes buggy. As a result, if you have:
div.foo p {
...
}
the kindlegen tool often incorrectly interprets it as:div.foo, div.foo p {
...
}
In situations where you want to apply the style to only the inner tag, you must add a custom class to the paragraphs inside the outer tag and use that CSS selector by itself.
For additional tips specific to developing content for KF8-capable readers, see the KF8 CSS Tips.
[edit]Format limitations
There are many limitations in the MOBI format. A few are listed here.
- Blocks of text can never have a greater than normal margin on their right side.
- Left margins can only be specified in 1em increments. Text can only have a hanging indent if it has no left margin. More recent kindle renderers have increased the left margin increment to roughly 2em increments.
- Text cannot flow around images taller than one line of text.
- Image sizes cannot be scaled with font size.
- In some -- but not all -- Mobipocket renderers, text with a left margin changes that margin value per line based upon the font-size at which point the preceding line-break occurred.
- Many measures, such as the indent of a hanging indent, cannot be specified in ems.
- Individual items of text cannot be displayed in a monospace font.
- Tables display wildly differently on different Mobipocket renderers, especially tables which cross more than one screen.
- Nested tables are not supported at all.
- In addition you only get the full range of Mobipocket's formatting capabilities if you have markup written to use Mobipocket's non-standard, extended, and under-documented implementation of HTML 3.2. See: File tag reference on the mobipocket web site.
[edit]MOBI DRM
Mobi DRM can optionally be applied to this file format. There is the standard scheme supported by Mobipocket and Overdrive servers. This is based on an ID derived from the reading device or program. This PID must be known to the server when an eBook is purchased and will be embedded in the file and locked to the device. The licensing scheme does permit multiple devices (usually up to 4) to be supported. In this case the server needs to know device id of all the devices. If you add a device you must tell the server and redownload the eBook to be able to read it on the new device. Normally there is no charge to add a device or for redownloading the eBook. If the dealer goes out of business you may not be able to add a device since there would be no way to redownload the file.
A second, simpler scheme, only requires knowledge of the account login name and password used to purchase the eBook. Once this data is entered the eBook can be read. Entering this data is only required once per device. This is a new scheme and some readers may not have support for this method.
A third method used on some ebooks is to use a generic MOBI key. It has encryption but only using the generic MOBI key (not a PID-specific key). This means that can be read by any MobiPocket Reader software, on any device, but not by any non-MobiPocket software.
The DRM applies only to the eBook itself and not to the metadata. A library routine can read the metadata without having to unlock the eBook. Some programs have been devised to even be able to change this information without touching the DRM portion of the file.
[edit]MOBI eBook Readers and converters
In addition to the MobiPocket supplied Readers there are also 3rd party readers and converters. This include:
- Calibre
- Stanza
- FBReader
- Book Designer
- BookMedia
- STDU Viewer
- Sumatra PDF
- MBP_reader (program that can extract MBP notes to text files).
- Kindle for PC or Mac
- EPUB to Kindle converter
- PDF/ePUB to Kindle Tool
- Kindle Book Development Tool
- KindleUnpack - previously called MobiUnpack - KindleUnpack will explode a mobi file into its original form. Also called a mobi decoder.
- PDF to ePUB/Mobi Converter
- KindleGen - Official Amazon tool to convert ePub to Mobi (AZW) or otherwise generate Mobi format.
[edit]MOBI eBook Hardware Readers
- Bookeen Cybook Gen3
- Bookeen Cybook Opus
- Hanlin V3 / Bebook / EZ Reader
- iRex iLiad
- iRex Digital Reader
- Amazon Kindle Readers
- Onyx BOOX readers
Not all eBook readers that support Mobi format have the same features. Check Mobi Comparison for details on what is actually supported.
[edit]Create a MOBI file from an ePub file
Here is one method to create a mobi file from an ePub file.
- Make sure you only use headers h1-h2-h3 for the TOC entries you want (reason for this later, or use 10, below);
- Make your entire ePub in Sigil, importing your HTML files as you go, using the "add existing item" option.
- Finish up your ePUB, don't add the Cover page.
- Download MobiPocket Creator and install.
- Unzip your ePUB.
- Double-click the OPF.
- The book will open in front of you in MobiPocket Creator.
- Drag-and-drop your cover into MBP Creator.
- Use MBP Creator to make a html.TOC with headers 1-2-3 only, or,
- Alternatively: Point MPCreator to an existing html.toc by editing the Guide Properties section. (n.b.--the toc.ncx will already be in the appropriate folder inside the "My Publications" folder of your MBPCreator dir)
- Click "Build."
- You have a fully-functional PRC file.
[edit]For more information
- Mobipocket Creator - free download, also see MobiPocket Creator
- Mobipocket Development center - creation documentation
- content generation - see paragraph formatting for CSS like features although there is no CSS in MOBI.
- Amazon KindleGen - an upgrade to mobiGen but works fine for mobi books.
- KindleGen - our wiki page on using KindleGen.
- MobileRead forum - Mobi unpack, take apart a mobi file.
- Java Mobi Metadata Editor - edit, add, and remove EXTH tags in mobi files.