Re: [trimedia] about prefetch and memory copy
- I know that a version of memcpy optimized for the 1500 is in the
pipeline, but I do not have it yet. In general, copying a frame will
never be efficient. Best to re-organize your code so that frame copying
is not required. For instance, if the frame is required for reference,
keep a pointer and do not re-use the memory until both users (display
and reference) are done. Or if encoder reference code uses a different
memory format, change encoder to use TM optimal format.
Here is some general material about cache operations:
- If you are using the 1300, forget prefetch. Due to various problems
(hardware and SW support), most people just wrote it off.
- If you are using the 1500, it may help you. I do not have a source
code example on hand.
The following two cache control custom ops can be useful:
Tell cache that you are about to overwrite the to the (n) cache line(s)
at address a. The cache hardware can skip its read operation.
Tell cache to bring the n cache lines at address a into cache.
The TCS 4.4 compiler does respect the ordering of these ops in C code.
Older compilers did not.
Look at the assembly and check the prefetch counters to be sure the ops
are not discarded.
Understand that there are conditions under which the prefetch
instruction will be discarded (not executed). This is usually because
the resources needed for prefetch are already engaged by another cache
operation (load). To understand whether this is happening, you will
have to look at the assembly code and set up the counters to monitor
prefetch operations. This is tedious stuff, much akin to assembly
The cache counters are described in the 3260 architecture book. Search
for the "MEM_EVENTS" MMIO register. This selects one of 15 items to
count. Two counters are typcially used to monitor these events. You
must select which ones. I have known programmers to run the same data 8
times so as to get the contents of all counters.
Some other things to know about prefetch:
- The PREFETCH instruction asks the cache to fetch data into data cache
from memory without waiting
- Requires copy back buffer for 5 cycles to determine LRU (ie, prefetch
could be discarded because copyback is in progress)
- Requires refill buffer to do the actual data fetch (ie, prefetch could
be discarded because other cache fetch is in progress)
- Will be discarded if buffers are not free
- Only one prefetch can be scheduled at a time
- The fetched data replaces the least recently used (LRU) entry in the cache
- Additional stalls can be avoided if next 5 instructions do not use the
And finally, more details about what is actually counted by the prefetch
1011: Prefetch operation:
Counts one when a prefetch operation is requested
1100: Prefetch operation discarded (because of cache hit or no resources).
Counts one when a prefetch operation is discarded for any reason
1101: Prefetch operation discarded (because of cache hit).
Counts one when a prefetch is discarded because of a cache hit.
1110: Instruction cache prefetch.
1111: Data cache prefetch
Hope this helps.
TriMedian at MDS www.mds.com
>[Non-text portions of this message have been removed]
> How to use prefetch such as prefr?
> I use them in my code,but it it takes more time to run my encoder.
> could anyone give me an example to tell me where I can use them?
> and,in order to copy an image to a new one,I use memcpy() to copy a
> line at a time,it takes 2.5s to copy a 640x480 image.
> How can I to save the time?