<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>
<channel>
	<title>超群.com的博客 &#187; win32com</title>
	<atom:link href="http://www.fuchaoqun.com/tag/win32com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.fuchaoqun.com</link>
	<description></description>
	<lastBuildDate>Thu, 08 Sep 2011 15:08:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Python转换office word文件为HTML</title>
		<link>http://www.fuchaoqun.com/2009/03/use-python-convert-word-to-html-with-win32com/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=use-python-convert-word-to-html-with-win32com</link>
		<comments>http://www.fuchaoqun.com/2009/03/use-python-convert-word-to-html-with-win32com/#comments</comments>
		<pubDate>Thu, 12 Mar 2009 05:57:38 +0000</pubDate>
		<dc:creator>超群.com</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[html]]></category>
		<category><![CDATA[pywin32]]></category>
		<category><![CDATA[win32com]]></category>
		<guid isPermaLink="false">http://chaoqun.17348.com/?p=182</guid>
		<description><![CDATA[这里测试的环境是：windows xp,office 2007,python 2.5.2,pywin32 build 213，原理是利用win32com接口直接调用office API，好处是简单、兼容性好，只要office能处理的，python都可以处理，处理出来的结果和office word里面“另存为”一致。 #!/usr/bin/env python &#160; #coding=utf-8 &#160; from win32com import client as wc &#160; word = wc.Dispatch&#40;'Word.Application'&#41; &#160; doc = word.Documents.Open&#40;'d:/labs/math.doc'&#41; &#160; doc.SaveAs&#40;'d:/labs/math.html', 8&#41; &#160; doc.Close&#40;&#41; &#160; word.Quit&#40;&#41; 关键的就是doc.SaveAs(&#8216;d:/labs/math.html&#8217;, 8)这一行，网上很多文章写成：doc.SaveAs(&#8216;d:/labs/math.html&#8217;, win32com.client.constants.wdFormatHTML)，直接报错： AttributeError: class Constants has no attribute &#8216;wdFormatHTML&#8217; 当然你也可以用上面的代码将word文件转换成任意格式文件（只要office 2007支持，比如将word文件转换成PDF文件，把8改成17即可），下面是office 2007支持的全部文件格式对应表： wdFormatDocument = 0 wdFormatDocument97 = 0 wdFormatDocumentDefault = [...]]]></description>
			<content:encoded><![CDATA[<p>这里测试的环境是：windows xp,office 2007,python 2.5.2,pywin32 build 213，原理是利用win32com接口直接调用office API，好处是简单、兼容性好，只要office能处理的，python都可以处理，处理出来的结果和office word里面“另存为”一致。</p>
<blockquote>
<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">#!/usr/bin/env python</span>
&nbsp;
<span style="color: #808080; font-style: italic;">#coding=utf-8</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">from</span> win32com <span style="color: #ff7700;font-weight:bold;">import</span> client <span style="color: #ff7700;font-weight:bold;">as</span> wc
&nbsp;
word = wc.<span style="color: black;">Dispatch</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'Word.Application'</span><span style="color: black;">&#41;</span>
&nbsp;
doc = word.<span style="color: black;">Documents</span>.<span style="color: black;">Open</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'d:/labs/math.doc'</span><span style="color: black;">&#41;</span>
&nbsp;
doc.<span style="color: black;">SaveAs</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'d:/labs/math.html'</span>, <span style="color: #ff4500;">8</span><span style="color: black;">&#41;</span>
&nbsp;
doc.<span style="color: black;">Close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
word.<span style="color: black;">Quit</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>
</blockquote>
<p>关键的就是doc.SaveAs(&#8216;d:/labs/math.html&#8217;, 8)这一行，网上很多文章写成：doc.SaveAs(&#8216;d:/labs/math.html&#8217;, win32com.client.constants.wdFormatHTML)，直接报错：</p>
<blockquote><p>AttributeError: class Constants has no attribute &#8216;wdFormatHTML&#8217;</p></blockquote>
<p>当然你也可以用上面的代码将word文件转换成任意格式文件（只要office 2007支持，比如将word文件转换成PDF文件，把8改成17即可），下面是office 2007支持的全部文件格式对应表：</p>
<pre>wdFormatDocument                    =  0
wdFormatDocument97                  =  0
wdFormatDocumentDefault             = 16
wdFormatDOSText                     =  4
wdFormatDOSTextLineBreaks           =  5
wdFormatEncodedText                 =  7
wdFormatFilteredHTML                = 10
wdFormatFlatXML                     = 19
wdFormatFlatXMLMacroEnabled         = 20
wdFormatFlatXMLTemplate             = 21
wdFormatFlatXMLTemplateMacroEnabled = 22
wdFormatHTML                        =  8
wdFormatPDF                         = 17
wdFormatRTF                         =  6
wdFormatTemplate                    =  1
wdFormatTemplate97                  =  1
wdFormatText                        =  2
wdFormatTextLineBreaks              =  3
wdFormatUnicodeText                 =  7
wdFormatWebArchive                  =  9
wdFormatXML                         = 11
wdFormatXMLDocument                 = 12
wdFormatXMLDocumentMacroEnabled     = 13
wdFormatXMLTemplate                 = 14
wdFormatXMLTemplateMacroEnabled     = 15
wdFormatXPS                         = 18</pre>
<p>照着字面意思应该能对应到相应的文件格式，如果你是office 2003可能支持不了这么多格式。word文件转html有两种格式可选wdFormatHTML、wdFormatFilteredHTML（对应数字8、10），区别是如果是wdFormatHTML格式的话，word文件里面的公式等ole对象将会存储成wmf格式，而选用wdFormatFilteredHTML的话公式图片将存储为gif格式，而且目测可以看出用wdFormatFilteredHTML生成的HTML明显比wdFormatHTML要干净许多。</p>
<p>当然你也可以用任意一种语言通过com来调用office API，比如PHP.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.fuchaoqun.com/2009/03/use-python-convert-word-to-html-with-win32com/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>
<!-- WP Super Cache is installed but broken. The path to wp-cache-phase1.php in wp-content/advanced-cache.php must be fixed! -->
