1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
|
# PICKit2 Firmware
Trying to make sense of [pk2cmd scripts](pk2cmd/pk2cmd_notes.html#scripts) lead me to needing to reverse engineer some of the firmware on the programmer, so here we are.
## Recon
The PICKit2 consists of a `PIC18F2550` and some supporting logic to program devices over ICSP. I have the `TODO: canakit, images` programmer which includes a ZIF socket, but seems to be functionally equivalent to connecting the same ICSP pins to the programmer.
Hardware-wise, it's worth noting the LEDs are driven directly by the controller on the programmer, and of course that the USB connector is wired to the `PIC18F2550` as well. This is informative because in addition to programming and handling pk2cmd scripts, the microcontroller is responsible for USB traffic as well.
## The Good Stuff
The [firmware](PK2V023200.hex) that is freely available is distributd as an Intel HEX format file. An additional annoyance, Radare2 doesn't seem to have support for the PIC18F, PIC24, nor PIC16F I expect to see.
So I wrote [an interactive disassembler](`TODO link to pydare`) to help keep track of notes as I walk through this firmware.
Starting with the first few bytes of the firmware is as good as anywhere. As PIC18 instructions:
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'arch pic18' 'd 16 @ 0' 'q' | aha --no-header --stylesheet
</pre></div>
`goto`s are interesting, as are the `BAD` - those are bytes that don't map to any PIC18 instruction.
The actual bytes there:
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'arch pic18' 'px 32 @ 0' 'q'
</pre></div>
and comparing with the memory map in the datasheet for the `PIC18F2550`, reproduced partially in the table below:
```
---------------------------------------
0000h | Reset Vector
0002h | -
0004h | -
0006h | -
0008h | High Priority Interrupt Vector
000ah | -
..
0018h | Low Priority Interrupt Vector
---------------------------------------
```
Following the goto at the reset vector leads to more setup code:
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'loaddb pk2cmd-stuff/pk2cmd_firmware' 'd 8 @ 0xf0a' 'q' | aha --no-header --stylesheet
</pre></div>
This loads [File Select Registers](../pic18.html#pic18f2550_fsr) 1 and 2 to defaults of 0x300, clears the table pointer, clears a bit, and calls some initialization functions.
Continuing through initialization functions from here seemed problematic, and it wasn't immediately obvious how this even leads to reading USB traffic from pk2cmd.
So while this eventually might lead to making sense of the firmware, at least trying to find USB accesses seems like a faster path to the script processing logic. The firmware is small, so disassembling the whole thing and searching for relevant SFR and instructions is quite tractable. From there it might be easy to reach the script interpreter, since that should be adjacent to the USB request handling logic.
Searching for USB-relevant SFR like
[UIR](../pic18.html#pic18f2550_fsr_uir),
[UIE](../pic18.html#pic18f2550_fsr_uie),
or [UCON](../pic18.html#pic18f2550_fsr_ucon) leads to the code around 0xb2a (notes included for reference):
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'loaddb pk2cmd-stuff/pk2cmd_firmware' 'd 50 @ 0xb2a' 'q' | aha --no-header --stylesheet
</pre></div>
In this code `UIE` and `UIR` are both tested to ensure that if an interrupt is seen by `UIR` it's also enabled in `UIE`. Then a call is made to the corresponding handler for the USB event that was seen. These handlers are all reasonable and mostly short. They also don't seem to provide much additional information on how data gets to the script interpreter, or where the script interpreter is.
In the process, this leads to some USB controller setup, which is still interesting:
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'loaddb pk2cmd-stuff/pk2cmd_firmware' 'd 50 @ urstif_handler' 'q' | aha --no-header --stylesheet
</pre></div>
Only really interesting to know how the USB controller is configured. Nothing that seems to be relevant for the script interpreter or code that stores and indexes scripts.
Failing to find it through following USB processing logic, the next thought is to just look for what might be dispatch tables associated with the contiguous blocks of script command opcodes.
As luck would have it, there are three sequences of unconditional branches at these addresses:
* `0x2334`
* `0x3b0a`
* `0x429e`
These are good to keep in mind, and there's some logic that updates PC around them, which is a good sign these are in fact switch tables.
Also seen, and may come in handy, is bit bang-looking logic around `0x525c`, through `LATA` 3 and somethig to do with `TRISA` 4:
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'loaddb pk2cmd-stuff/pk2cmd_firmware' 'd 40 @ 0x525c' 'q' | aha --no-header --stylesheet
</pre></div>
With some good leads, it should be easy to correlate these tables with script command implementations as a sanity check. The simplest script commands seem like `SCMD_BUSY_LED_OFF` and `SCMD_BUSY_LED_ON`, which are `0xf4` and `0xf5` respectively. Following traces on the board, these should just map to setting a bit in the appropriate latch high or low.
There are in fact several branch targets that seem like they would fill this role: everything between `0x3b60` and `0x3b92`:
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'arch pic18' 'd 26 @ 0x3b60' 'q' | aha --no-header --stylesheet
</pre></div>
From looking at the board, it looks like the busy LED is pin 4 in `PORTB`. Searching `PORTB` yields a lot of `btfss` and `btfsc`, but searching `LATB` is entirely `bsf` and `bcf`. Further refining to `LATB, 4` yields 12 possibly interesting instructions.
Because these operations would be the result of command dispatch, and are adjacent codes, the implementations are probably close together. The only close pairs:
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'loaddb pk2cmd-stuff/pk2cmd_firmware' 'd 6 @ 0x26ca' 'q' | aha --no-header --stylesheet
</pre></div>
and
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'arch pic18' 'd 4 @ 0x3b60' 'q' | aha --no-header --stylesheet
</pre></div>
Further reading around the jump tables, noticed some interesting constants before the first jump table:
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'arch pic18' 'd 7 @ 0x22ae' 'q' | aha --no-header --stylesheet
</pre></div>
This is immediately eyecatching on account of 0x42 being `FWCMD_ENTER_BOOTLOADER`.
The subsequent `xorlw #0x34` functionally results in `W == W_original ^ 0x42 ^ 0x34`. Because xor is associative that can be read as `W == W_original ^ (0x42 ^ 0x34)`, or `W == W_original ^ (0x76)`. `0x76` is, itself, also interesting, as that's the opcode for `FWCMD_FIRMWARE_VERSION`.
The third xor at `0x22b6` is equivalent to `W == W_original ^ (0x42 ^ 0x34 ^ 0x2c)` aka `W == W_original ^ (0x5a)`. 0x53 maps to `FWCMD_NO_OPERATION`, so at this point this is definitely the start of the command interpreter loop.
At this point there's something to closely inspect, so it's worth investigating a structure seen many times in this firmware:
<div class="codebox"><pre>
; a call target
0x222c: movff FSR2L, POSTINC1
0x2230: movff FSR1L, FSR2L
; ^ what's up with this and why?
; and at the return... (different function)
0x7532: movf POSTDEC1, POSTDEC1
0x7534: movff INDF1, FSR2L
0x7536: return
</pre></div>
So at function entry the code stores the previous FSR2L to `[FSR1++]`, then `FSR1 => FSR2`. because the PIC18F2550 doesn't have a stack, this seems like a decent approximation of the same context-saving behavior. And of course, at return, context is restored.
This provides some illumination on another interesting idiom:
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'arch pic18' 'd 6 @ 0x22d2' 'q' | aha --no-header --stylesheet
</pre></div>
Again, FSR1 is being used akin to a stack to pass parameters.
`setf POSTINC1` stores `0xFF` through `FSR1` and increments, the subsequent `movlw #0x4; movwf POSTINC1` provides a second argument, which aligns with the two `movf POSTDEC` after the call (undoing the increments to provide parameters)
Returning to the jump table-style code, register 0x54 is compared with 0xb9 (in the range of opcodes, though well into `SCMD_` space), and if the opcode is less than 0xb9, a branch is taken to `0x2306`:
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'arch pic18' 'd 30 @ 0x2306' 'q' | aha --no-header --stylesheet
</pre></div>
At `0x2306` the opcode is turned into an offset into the subsequent jump table and added to PC, implementing the switch by opcode. At this point it's pretty reasonable to add labels for the cases to indicate what command they correspond to. Out of an abundance of caution though, it's worthwhile verfiying at least a simple command.
`FWCMD_RESET` gives us exactly that:
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'loaddb pk2cmd-stuff/pk2cmd_firmware' 'd 3 @ FWCMD_RESET' 'q' | aha --no-header --stylesheet
</pre></div>
This case is the "reset" instruction plus logic to advance to the next opcode in the script. Of course, that never gets reached because the chip has reset, but that's a clear artifact of compilation.
#### FWCMD instructions to revisit
* `FWCMD_EXECUTE_SCRIPT`: this is relevant for programmer operation. all operations seem to be scripts.
* `FWCMD_RUN_SCRIPT`: how is this different from "execute"?
* `FWCMD_DOWNLOAD_SCRIPT`: this is the command to download a script to the programmer. can we read script memory? how many scripts can there be? what's their size?
* `FWCMD_DOWNLOAD_DATA`: seems interesting. download to programmer or to target?
* `FWCMD_UPLOAD_DATA`: same as above
* `FWCMD_END_OF_BUFFER`: might be interesting to know later
Will come back to these in the future, now that there's a plausible starting point to find their logic.
With `FWCMD` addressed, it's still not clear how `SCMD` commands are dispatched, which is probably a different table. Time to keep looking. Onward to the table around `0x3b0a`...
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'arch pic18' 'd 58 @ 0x3aec' 'q' | aha --no-header --stylesheet
</pre></div>
At `0x3aec` the opcode (again in register `0x54`) is compared against `0xd5`. If the opcode is that or above, `0x3af0` branches to a different but similar switch on opcode value. Since then it's safe to assume the entries are in order from command `0xd5` up, script command names apply cleanly to each branch:
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'loaddb pk2cmd-stuff/pk2cmd_firmware' 'd 58 @ 0x3aec' 'q' | aha --no-header --stylesheet
</pre></div>
While if the opcode was below `0xd5`, the branch at `0x3af2` kicks in and takes us to...
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'arch pic18' 'd 54 @ 0x3af2' 'q' | aha --no-header --stylesheet
</pre></div>
Again, very similar structure to other tables, but starts with `0xb3`. `0xb3` is not in the header file and is unknown, but the rest can be filled in.
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'loaddb pk2cmd-stuff/pk2cmd_firmware' 'd 54 @ 0x3af2' 'q' | aha --no-header --stylesheet
</pre></div>
(note: at this point as i was keymashing i accidentally hit ctrl+c, which my script doesn't handle, and lost three hours of notes give or take. largely looking through the interpreter cases, and easy enough to rebuild, but beware sigint)
With the script command dispatcher found and marked up, it's time to return to the original questions:
* What script opcodes take how many parameters?
* How does programming actually take place?
## Script Opcodes
Easiest to start with obvious opcodes where the intended functionality is straightforward, to find unknown details about implementation. `SCMD_NOP24` is an obvious candidate here: functionally it *probably* does nothing, but does likely increment some pointers on the programmer side (current script opcode pointer, if such a thing exists), and possibly on the device side (to cause the target device to execute a no-op).
### `SCMD_NOP24`
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'arch pic18' 'd 25 @ 0x4112' 'q' | aha --no-header --stylesheet
</pre></div>
`SCMD_NOP24` makes four calls to the same function, with parameters that vary:
* `0x5066(4, 0)`
* `0x5066(8, 0)`
* `0x5066(8, 0)`
* `0x5066(8, 0)`
the sequence of three calls with `(8, 0)` probably correspond to the bytes `00 00 00`, which decode under PIC24 to a NOP. So this seems sane! The remaining question if this is true, is why is there a call with arguments `(4, 0)`?
The answer to that is a little further, at `0x5066`:
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'arch pic18' 'd 53 @ 0x5066' 'q' | aha --no-header --stylesheet
</pre></div>
This function reads two parameters, stores the second parameter to `0x54`, and the first to `0x55`. As the first parameter for these calls is either 8 or 4, and the latter parameter is all 0's, this continues to make sense.
The function then tests if `0x2e1` > `0x2` and branches below if so.
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'arch pic18' 'd 12 @ 0x5084' 'q' | aha --no-header --stylesheet
</pre></div>
This sequence of instructions is responsible for putting pin 2 of latch A into a high or low state. Bit 0 in `0x54` can only be 1 or 0, so exactly one of `btfss` or `btfcs` will result in a fallthrough. As a result, only one of `bcf LATA, 2; bsf LATA, 2` will be executed. Afterward, the `nop; bsf LATA, 3; nop; bcf LATA, 3`, which raises waits, then lowers another GPIO pin. This makes sense given that the programmer is responsible for directly driving clock signals on the remote chip in programming modes, so this is the logic that actually sends data to the target chip, one bit at a time.
There is another region of this function that's similar save for one change. If the value in `0x2e1` was above 2, a different bit banging loop is reached, down at `0x509c`:
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'arch pic18' 'd 20 @ 0x509c' 'q' | aha --no-header --stylesheet
</pre></div>
This loop is the same as above, as far as setting the data pin (LATA bit 2). The difference is that before alternating the clock three nops are used (rather than the earlier one nop), and those nops are executed in a loop. The loop repeats `[0x2e1]` times and functionally is the same bitbanging loop with longer periods between clock cycles. This may be relevant for some target devices that have different tolerances in serial programming, and makes sense to keep as a single "send these bits" abstraction.
Back to the question of the two arguments, while one is sent one bit at a time, the other seems to count the number of bits to send:
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'arch pic18' 'd 2 @ 0x50c8' 'q' | aha --no-header --stylesheet
</pre></div>
This function is noted down as `tx_bits` for later reference.
So `(4, 0)` sends the bits `0000`, while `(8, 0)` sends `00000000`. Taken together `SCMD_NOP24` sends `0000b 000000`, and inferring from the PIC18 ICSP guide a leading `0000` prefix should indicate the following word is an instruction, yielding the expected `NOP` on the target device.
### `SCMD_COREINST24`
Moving on to `SCMD_COREINST24`, which functionally should send three user-specified bytes to the target...
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'arch pic18' 'define-function "tx_bits" @ 0x5066' 'd 2 @ 0x41b2' 'q' | aha --no-header --stylesheet
</pre></div>
As expected, this calls the same transmit function, and passes 4 and 8 in the same places. The major difference is how the second parameter is provided, where this version involves a bit of an incantation:
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'arch pic18' 'define-function "tx_bits" @ 0x5066' 'd 14 @ 0x41c6' 'q' | aha --no-header --stylesheet
</pre></div>
Since the three bytes for a pic24 instruction are part of the script (in fact, the three bytes after the `SCMD_COREINST24`), it stands to reason that the value passed is read out of the script buffer, tugging this thread should lead to the script buffer and eventually where it's selected. Interestingly, the logic appears to be:
* read the current script buffer offset (through `INDF2` at `0x41c6`)
* increment the offset (`0x41c8`)
* load the pointer to the script array (`0x41cc`-`0x41d4`) into `FRS0` - this comes from an argument!
* add the offset into the pointer (`0x41d8`)
* read that address (`0x41e0`)
* "push" the parameter (`0x41e2`)
This confirms that for the various commands we can track `INDF2` modification to determine the number of parameters for the various commands!
The first unknown script commands ones of interest: `SCMD_WRITE_BUFBYTE_W` and `SCMD_WRITE_BUFWORD_W`
### `SCMD_WRITE_BUFWORD_W`
```
TODO: SCMD_WRITE_BUFWORD_W
```
this handler involves the same transmit function from earlier and also another function, `0x5006`.
guessing from context it appears to read from a global address and return the byte in W. `SCMD_WRITE_BUFWORD_W` then uses that as bits to send down the wire.
### `SCMD_WRITE_BUFBYTE_W`
`SCMD_WRITE_BUFBYTE_W` does the same, but only reads one byte, the second value transmitted is a 0.
### Miscellaneous Script Commands
`SCMD_WRITE_BITS_LITERAL` takes two parameters, the number of bits and the pattern to send.
`SCMD_WRITE_BYTE_LITERAL` takes one parameter, the pattern, and assumes the number of bits to send is 8.
TODO: `SCMD_RD2_BYTE_BUFFER`.
TODO: `SCMD_READ_BYTE`.
TODO: `SCMD_VISI24`. (transmits (4, 1), (8, 0), then some unknown calls.)
### `SCMD_DELAY_LONG`
Passes the single byte parameter along to the function at `0x4966`, which...
<div class="codebox"><pre>
#eval python pk2cmd-stuff/pydare.py 'o pk2cmd-stuff/PK2V023200.hex' 'arch pic18' 'define-function "tx_bits" @ 0x5066' 'd 11 @ 0x4966' 'q' | aha --no-header --stylesheet
</pre></div>
At `0x496e` the flag indicating `timer0` has fired is cleared (since it may have in the past), followed by multiplying the argument by `0xff` and loading the low byte of the result into `tmr0h`? This really has no result other than to negate it before setting the timer. From there, the timer is enabled and the controller spins until the timer fires (`0x497e`).
`delay`. Dunno what else to have expected.
### `SCMD_LOOP`
TODO, eventually.
|