UserPreferences

DailyNotes/2005/11/14


  1. How to create a custom data type that can be read off wx.Clipboard
  2. Getting CF_HTML off the clipboard in a cross-platform manner
  3. How to create a custom data type that can be read off wx.Clipboard
  4. Getting CF_HTML off the clipboard in a cross-platform manner
  5. Script for the clipboard -> blog transfer
  6. Notelets
    1. Personal
    2. Work
  7. Today's pictures
  8. Yesterday's pictures
  9. Open Threads

How to create a custom data type that can be read off wx.Clipboard

/ClipboardAndCustomDataInWxPython

Getting CF_HTML off the clipboard in a cross-platform manner

I have written Win32-oriented Python code to copy 'HTML Format' data from the Win32 clipboard into a new message on my blog. Now, I'm trying to translate this code so that it can work on OS X and Linux. The obvious place to start for me is using wxPython, specifically [WWW]wx.Clipboard, [WWW]wx.DataObject, specifically [WWW]wx.DataObjectSimple.

I've been trying to figure out to read custom items off wx.Clipboard. I am able to read off different formats using Win32 specific methods.

Consider the example of copying a selection from a HTML page in Firefox 1.0.7 on Windows XP. I can see the available data formats on the clipboard by running getAvailableFormats() (see code below):

>>> ClipboardToBlog.getAvailableFormats()
{'CF_TEXT': 1, 'CF_LOCALE': 16, 'text/_moz_htmlcontext': 49898, 'text/html': 49308, 'text/_moz_htmlinfo': 49935, 'Ole Private Data': 49171, 'DataObject': 49161, 'CF_OEMTEXT': 7, 'HTML Format': 49426, 'CF_UNICODETEXT': 13}

I can read off the text format from the clipboard with

How to create a custom data type that can be read off wx.Clipboard

I've been trying to create a custom wx.DataObject that I can put and get off the wx.Clipboard.

Getting CF_HTML off the clipboard in a cross-platform manner

I have written Win32-oriented Python code to copy 'HTML Format' data from the Win32 clipboard into a new message on my blog. Now, I'm trying to translate this code so that it can work on OS X and Linux. The obvious place to start for me is using wxPython, specifically [WWW]wx.Clipboard, [WWW]wx.DataObject, specifically [WWW]wx.DataObjectSimple.

I've been trying to figure out to read custom items off wx.Clipboard. I am able to read off different formats using Win32 specific methods.

Consider the example of copying a selection from a HTML page in Firefox 1.0.7 on Windows XP. I can see the available data formats on the clipboard by running getAvailableFormats() (see code below):

>>> ClipboardToBlog.getAvailableFormats()
{'CF_TEXT': 1, 'CF_LOCALE': 16, 'text/_moz_htmlcontext': 49898, 'text/html': 49308, 'text/_moz_htmlinfo': 49935, 'Ole Private Data': 49171, 'DataObject': 49161, 'CF_OEMTEXT': 7, 'HTML Format': 49426, 'CF_UNICODETEXT': 13}

I can read off the text format from the clipboard with

  1 
  2 
  3 
  4 
  5 
  6 
  7 
  8 
  9 
 10 
 11 
 12 
 13 
 14 
def GetTextFromClipboard():
    """
    """
    clipboard = wx.Clipboard()
    if clipboard.Open():
        if clipboard.IsSupported(wx.DataFormat(wx.DF_TEXT)):
            data = wx.TextDataObject()
            clipboard.GetData(data)
            s = data.GetText()
            clipboard.Close()
        clipboard.Close()
        return s
    else:
        return None

e.g.,

>>> LearningWxClipboard.GetTextFromClipboard()
u"I have written Win32-oriented Python code to copy 'HTML Format' data from the Win32 clipboard into a new message on my blog. Now, I'm trying to translate this code so that it can work on OS X and Linux. The obvious place to start for me is using wxPython, specifically [WWW]wx.Clipboard, [WWW]wx.DataObject, specifically [WWW]wx.DataObjectSimple.\r\n\r\n"

Now, I want to retrieve the 'HTML Format' data. First, I set up the wx.DataFormat

  1 
  2 
  3 
#! python
CFHtmlFormat = wx.DataFormat(0)
CFHtmlFormat.SetId('HTML Format')

Note that

  1 
CFHtmlFormat.GetType()
returns 49426 (which is the value I get from the win32 enumeration)

[writing in progress....]

Script for the clipboard -> blog transfer

  1 
  2 
  3 
  4 
  5 
  6 
  7 
  8 
  9 
 10 
 11 
 12 
 13 
 14 
 15 
 16 
 17 
 18 
 19 
 20 
 21 
 22 
 23 
 24 
 25 
 26 
 27 
 28 
 29 
 30 
 31 
 32 
 33 
 34 
 35 
 36 
 37 
 38 
 39 
 40 
 41 
 42 
 43 
 44 
 45 
 46 
 47 
 48 
 49 
 50 
 51 
 52 
 53 
 54 
 55 
 56 
 57 
 58 
 59 
 60 
 61 
 62 
 63 
 64 
 65 
 66 
 67 
 68 
 69 
 70 
 71 
 72 
 73 
 74 
 75 
 76 
 77 
 78 
 79 
 80 
 81 
 82 
 83 
 84 
 85 
 86 
 87 
 88 
 89 
 90 
 91 
 92 
 93 
 94 
 95 
 96 
 97 
 98 
 99 
100 
101 
102 
103 
104 
105 
106 
107 
108 
109 
110 
111 
112 
113 
114 
115 
116 
117 
118 
119 
120 
121 
122 
123 
124 
125 
126 
127 
128 
129 
130 
131 
132 
133 
134 
135 
136 
137 
138 
139 
140 
141 
142 
143 
144 
145 
146 
147 
148 
149 
150 
151 
152 
153 
154 
155 
156 
157 
158 
159 
160 
161 
162 
163 
164 
165 
166 
167 
168 
169 
170 
171 
172 
173 
174 
175 
176 
177 
178 
179 
180 
181 
182 
183 
184 
185 
186 
187 
188 
189 
190 
191 
192 
193 
194 
195 
196 
197 
198 
199 
200 
201 
202 
203 
204 
205 
206 
207 
208 
209 
210 
211 
212 
213 
214 
215 
216 
217 
218 
219 
220 
221 
222 
223 
224 
225 
226 
227 

#AUTHOR: Raymond Yee.  You are free to use/reuse this code with attribution.
#module ClipboardToBlog
__all__ = ['cb2RY','cb2IU']

import win32clipboard

from BaseHTMLProcessor import BaseHTMLProcessor

class CommentStripper (BaseHTMLProcessor):
    """
    strip comments
    """
    def __init__(self):
        BaseHTMLProcessor.__init__(self)
    def handle_comment(self, text):
        pass

class MoinIconStripper(BaseHTMLProcessor):
    """
    if it's a moin-www.png and the alt is [WWW], strip it away
    """
    def __init__(self,s):
        BaseHTMLProcessor.__init__(self)
        self.feed(s)
        self.close()

    def start_img(self, attrs):
        import os.path
        attrhash = {}
        for (key, value) in attrs:
            attrhash[key] = value
        try:
            src = attrhash['src']
            imgfile = os.path.split(src)[1]
            alt = attrhash['alt']
            if not ((imgfile == 'moin-www.png') and (alt == '[WWW]')):
                BaseHTMLProcessor.unknown_starttag(self, 'img', attrs)
        except:
            BaseHTMLProcessor.unknown_starttag(self, 'img', attrs)

    def __str__(self):
        return self.output()


class CF_HTML_Exception(Exception):
    pass

class CF_HTML:
    """
    s is of the form CF_HTML
    """
    def __init__(self,s):
        self.parseOut(s)

    def parseOut(self,s):
        from string import split
        import re

        # pick out version, start, end
        rawstr = r"""^Version:(.+)$"""
        match_obj = re.search(rawstr, s,  re.MULTILINE)
        if (match_obj):
            version = match_obj.group(1)
        else:
            raise CF_HTML_Exception('Cannot find version in CF_HTML')

        rawstr = r"""^StartHTML:(.+)$"""
        match_obj = re.search(rawstr, s,  re.MULTILINE)
        if (match_obj):
            StartHTML = int(match_obj.group(1))
        else:
            raise CF_HTML_Exception('Cannot find StartHTML in CF_HTML')


        rawstr = r"""^EndHTML:(.+)$"""
        match_obj = re.search(rawstr, s,  re.MULTILINE)
        if (match_obj):
            EndHTML = int(match_obj.group(1))
        else:
            raise CF_HTML_Exception('Cannot find EndHTML in CF_HTML')

        rawstr = r"""^StartFragment:(.+)$"""
        match_obj = re.search(rawstr, s,  re.MULTILINE)
        if (match_obj):
            StartFragment = int(match_obj.group(1))
        else:
            raise CF_HTML_Exception('Cannot find StartFragment in CF_HTML')


        rawstr = r"""^EndFragment:(.+)$"""
        match_obj = re.search(rawstr, s,  re.MULTILINE)
        if (match_obj):
            EndFragment = int(match_obj.group(1))
        else:
            raise CF_HTML_Exception('Cannot find EndFragment in CF_HTML')

        #print version,StartHTML, EndHTML, StartFragment, EndFragment    

        self.htmlString = s[StartHTML:EndHTML]
        htmlFragment = s[StartFragment:EndFragment]

        # strip comments from htmlFragment

        #print htmlString

        parser = CommentStripper()
        parser.feed(htmlFragment)
        parser.close()

        self.htmlFragment = parser.output()


def getCF_HTML():
    af = getAvailableFormats()
    if af.has_key('HTML Format'):
        win32clipboard.OpenClipboard()
        s = win32clipboard.GetClipboardData (af['HTML Format'])
        win32clipboard.CloseClipboard()
        return s
    else:
        return None


def buildNativeFormatList():
    """
    builds a dictionary mapping
    """
    import win32clipboard
    h = {}
    keys = win32clipboard.__dict__.keys()
    for key in keys:
        if key[0:3] == 'CF_':
            val = int(eval('win32clipboard.' + key))
            h[val] = key
    return h

def getAvailableFormats():

    availableFormats = {}
    win32clipboard.OpenClipboard()

    #print "There are %s formats available" % (str(win32clipboard.CountClipboardFormats()))
    val = win32clipboard.EnumClipboardFormats(0)

    nativeFormats = buildNativeFormatList()
    #print 'nativeFormats', nativeFormats

    while(val):
        try:
            format = win32clipboard.GetClipboardFormatName(val)
            #print val, format
            availableFormats[format]=val
        except:
            if nativeFormats.has_key(val):
                format = nativeFormats[val]
                availableFormats[format]=val
            else:
                if availableFormats.has_key('UNKNOWN'):
                    availableFormats['UNKNOWN'] = availableFormats['UNKNOWN'].append(val)
                else:
                    availableFormats['UNKNOWN'] = [val]
        val = win32clipboard.EnumClipboardFormats(val)
    win32clipboard.CloseClipboard()
    return availableFormats

class MTAccount:
    def __init__(self,user,password,url,blogediturl,blogID):
        self.user = user
        self.password = password
        self.url = url
        self.blogediturl = blogediturl
        self.blogID = blogID


def postToMT(account,body):
    from PyMT import PyMT
    import webbrowser
    mt = PyMT(account.url,account.user,account.password)
    content = {'title':'New Post', 'description':body, 'categories':[0] }
    try:
        postID = mt.newPost(account.blogID,content,0)
        webbrowser.open(account.blogediturl % (str(postID),str(account.blogID)))
    except:
        print 'error in posting to MT'

class ManilaAccount:
    def __init__(self,user,password,url):
        self.user = user
        self.password = password
        self.url = url


def postNewsItemToManila(account,body):
    from manilalib import Site, NewsItem
    s = Site(account.url,account.user,account.password)
    newsitem = NewsItem('New Post','htt://raymondyee.net/wiki',body,'Unclassified')
    try:
        s.messages.append(newsitem)
        return len(s.messages)
    except:
        'error in posting to Manila'


def cb2RY():
    s = getCF_HTML()
    if s:
        cfhtml = CF_HTML(s)
        mtaccount = MTAccount('USER','PASSWORD','http://raymondyee.net/mt/mt-xmlrpc.cgi',"http://raymondyee.net/mt/mt.cgi?__mode=view&_type=entry&id=%s&blog_id=%s",'1')
        postToMT(mtaccount, str(MoinIconStripper(cfhtml.htmlFragment)))

def cb2IU():
    s = getCF_HTML()
    if s:
        cfhtml = CF_HTML(s)
        account = ManilaAccount('USER','PASSWORD','http://iu.berkeley.edu/rdhyee')
        postNewsItemToManila(account,str(MoinIconStripper(cfhtml.htmlFragment)))


if __name__ == '__main__':
    import sys
    if len(sys.argv):
        dir = sys.argv[1]
        if (dir == 'RY'):
            cb2RY()
        if (dir == 'IU'):
            cb2IU()

Notelets

Personal

Laura and I celebrate 6 months of wedded blissful happiness today!

Work

Today's pictures

Picture288_14Nov05
IMGP4507
Stacey's Books in San Francisco

Yesterday's pictures

giant pumpkin

Open Threads

I usually like to work in parallel on a number of entries. Here I list them so they can be easily noted and accessed: