Loading ...
Sorry, an error occurred while loading the content.

Re: Patches (NXWidgets, STM32 DMA)

Expand Messages
  • petteriaimonen
    Hi, ... Hmm, for example the SPI driver doesn t know the size of the buffer in advance and may handle quite large transfers (512 bytes to SD card is common).
    Message 1 of 14 , Jun 4, 2013
    • 0 Attachment
      Hi,

      > Wouldn't it be possible to ensure that the transfer buffers (can mean adding another buffer) are allocated in non-CCM memory, and then do a memcpy() to the final destination once the DMA transfer has completed? That would not be terrible at all regarding the overhead, and it wouldn't be terrible regarding the required buffer size either.

      Hmm, for example the SPI driver doesn't know the size of the buffer in advance and may handle quite large transfers (512 bytes to SD card is common). But I guess one could transfer a small buffer multiple times.

      > In our autopilot use case your implementation means that if something slightly changes during system boot, we might get into the CCM memory region and the CPU load might jump from OK to overloaded (SPI DMA vs non DMA) - this is something that would be quite dangerous.

      Currently you would jump from OK to not working at all. If you allocate the DMA buffers correctly, the patch does not change behaviour at all.

      But I guess this could be configurable, so that instead of automatic fallback you can choose to get an error (maybe with better debug message than currently).

      --

      To elaborate the situation where I'm using the patch: I have configured thread stacks to go into CCM (by modifying up_createstack for my board). In this way I can keep heap and everything in normal RAM.

      However trouble arises when some touchscreen drivers etc. have small transfer buffers that are allocated on stack. Instead of going around modifying those and hoping to catch them all, I did this patch. It allows one to enable DMA without worrying about anything breaking.

      --
      Petteri
    • Meier Lorenz
      Hi Petteri, Sorry, I wasn t clear: I m aware that it won t work from CCM RAM, and I was already trying to plan for a fix for that. Having it at least not
      Message 2 of 14 , Jun 4, 2013
      • 0 Attachment
        Hi Petteri,

        Sorry, I wasn't clear: I'm aware that it won't work from CCM RAM, and I was already trying to plan for a fix for that. Having it at least not failing is certainly better, and so the patch is in any case worthwhile. It just seems that putting an end to the allocation fragility altogether is quite within reach (for you, I admit that it would take me time to get into it and to get it right).

        I'm just still wondering if a (configurable) intermediate buffer wouldn't be space well spent - one UART already consumes 256 bytes per default, and so having one 1-2 buffers per SPI bus and dropping for the CCM RAM from a direct transfer to 2x DMA + 2x memcpy() sounds like a deal.

        Since you already have the detection in place, could I maybe tease you (mhh, I would offer to buy you dinner if you ever come to Zurich) into adding an option to allocate a static buffer in the non-CCM region on a configuration flag?

        Obviously I can make sure to do that on my end correctly for starters, but it sounds so hardware coupled that I would think the driver layer is the intuitive location, so that application-level code doesn't need to take care to do the allocation the right way.

        -Lorenz


        On Jun 4, 2013, at 11:45 AM, petteriaimonen <petteri.aimonen@...<mailto:petteri.aimonen@...>> wrote:



        Hi,

        > Wouldn't it be possible to ensure that the transfer buffers (can mean adding another buffer) are allocated in non-CCM memory, and then do a memcpy() to the final destination once the DMA transfer has completed? That would not be terrible at all regarding the overhead, and it wouldn't be terrible regarding the required buffer size either.

        Hmm, for example the SPI driver doesn't know the size of the buffer in advance and may handle quite large transfers (512 bytes to SD card is common). But I guess one could transfer a small buffer multiple times.

        > In our autopilot use case your implementation means that if something slightly changes during system boot, we might get into the CCM memory region and the CPU load might jump from OK to overloaded (SPI DMA vs non DMA) - this is something that would be quite dangerous.

        Currently you would jump from OK to not working at all. If you allocate the DMA buffers correctly, the patch does not change behaviour at all.

        But I guess this could be configurable, so that instead of automatic fallback you can choose to get an error (maybe with better debug message than currently).

        --

        To elaborate the situation where I'm using the patch: I have configured thread stacks to go into CCM (by modifying up_createstack for my board). In this way I can keep heap and everything in normal RAM.

        However trouble arises when some touchscreen drivers etc. have small transfer buffers that are allocated on stack. Instead of going around modifying those and hoping to catch them all, I did this patch. It allows one to enable DMA without worrying about anything breaking.

        --
        Petteri
      • Gregory N
        Hi, Petteri, Sorry to be slow to respond. I have been suffering from ISP problems this morning. I have checked in all of the changes will the following
        Message 3 of 14 , Jun 4, 2013
        • 0 Attachment
          Hi, Petteri,

          Sorry to be slow to respond. I have been suffering from ISP problems this morning. I have checked in all of the changes will the following differences:

          1) 0003: Can't the redraw() method in inline?
          2) 0004-0005: I added a setting CONFIG_STM32_DMACAPABLE to enable or disable the feature

          Petteri, Lorenz,

          > Sorry, I wasn't clear: I'm aware that it won't work from CCM RAM, and I was already trying to plan for a fix for that. Having it at least not failing is certainly better, and so the patch is in any case worthwhile. It just seems that putting an end to the allocation fragility altogether is quite within reach (for you, I admit that it would take me time to get into it and to get it right).
          > ...

          I think Petteri's patch looks pretty good although I would did make it a configuration option. Only the STM32 F4 has CCM memory and to the other platforms should not need this change (although I have experienced problems DMAing with FSMC SRAM in one direction on the F1). For some of the tiny STM32 platforms, we really have to be careful.

          If there is a common solution, I would like to understand what that solution is. There are a number of 'ad hoc' things being done now:

          1. For the F4, there is an option to exclude CCM memory from the heap(CONFIG_STM32_CCMEXCLUDE). If you have other uses for the CCM memory then that is the best solution.

          2. Some drivers allocate their own DMA buffers and have special callouts to assure that they allocate from DMA memory. The idea is that you can provide a special DMA memory allocator to provide the right kind of memory. The following drivers take this approach:

          CONFIG_FAT_DMAMEMORY - This will cause the FAT file system to allocate its internal, sector I/O buffers using a special DMA buffer allocator.

          CONFIG_USBDEV_DMAMEMORY - This will cause *all* device controller drivers to allocate memory using the special DMA buffer allocator.

          In mm/, I have implemented a special "granule" allocator which is intended to allocate aligned buffers from special memory pools. This is the recommended way to implement DMA allocators. The current malloc logic is capable is supporting allocations from different heaps as well and could also be used for that purpose as well using less FLASH memory.

          3. The LPC17xx and others have this same DMA memory issue. Usually it is opposite problem: Only one memory region is capable of DMA and the others are not. But I would not recommend what I did in that case which was to hard-code, pre-allocated DMA buffers in this special memory. I wish I had used a DMA allocator for this platform.

          In my opinion, special casing addresses and copy from non-DMA to DMA buffers feels kludgey (I know that there is some driver that copies between buffers now, but I can remember which).

          So I think that in the longer run, the solution is to support memory pools and special allocators to get memory from the different memory pools. A DMA memory pool is one example. The only issue is that malloc() does not support the additional parameters needed to specify the memory pool. Inside of NuttX, that is not an issue because the OS has its own internal allocators and they could be extended with some special arguments to select memory pools.

          The SPI case is different, however, because as Petteri notes, you very often create a tiny buffer of a few bytes on the stack and do the SPI I/O through those buffers. Another way that this could be solved would be assure that stacks are not allocated from CCM memory, another form of memory pool.

          Non-DMA SPI will not be a random occurrence; if the task that performs SPI I/O has its stack in CCM SRAM, then DMA will never be performed on those transfers. The only way to avoid that problem would be to allocate the stack in non-CCM memory using a special, OS memory allocator.

          Petteri has his solution in hand now. I am open to discussion or recommendations on the best approach going forward.

          Greg
        • petteriaimonen
          Hi, What would this sound like as a long-term plan: 1) Have a platform-independent way to allocate a buffer for DMA use. For example dma_malloc() and
          Message 4 of 14 , Jun 4, 2013
          • 0 Attachment
            Hi,

            What would this sound like as a long-term plan:

            1) Have a platform-independent way to allocate a buffer for DMA use. For example dma_malloc() and dma_free(). I don't particularly care whether it uses granule allocator or just a separate malloc heap.

            Important thing is that it should be a system-wide DMA pool, which would not need any extra parameters at the call site. Also it could be configured to map back to normal malloc() where separation is not needed.

            2) Have per-platform function like stm32_dmacapable() to check whether DMA can be used for given buffer.

            3) In every DMA-capable device driver, check incoming buffers. If they are DMA-capable, use them directly. Otherwise allocate a temporary buffer using dma_malloc(), memcpy() over the data and then do the transfer from that temp buffer. There should probably be a configurable maximum size for the buffers that it tries to allocate, and do the transfer in multiple parts if it is larger than that.

            Advantages:
            + Able to execute other tasks while the transfer is ongoing, as the memcpy() is likely to be faster than the peripheral.
            + Transparent operation with any kind of buffer.
            + Able to get rid of the strange fat_dmaalloc() etc. hacks.

            Disadvantages:
            - How to handle dma_malloc() out-of-memory conditions?
            - dma_malloc() becomes quite speed critical.
            - What if there comes some platform where there are e.g. two DMA controllers with separate SRAMs?

            --
            Petteri
          • Meier Lorenz
            Greg, Before I ll contribute more feedback I d like to test a bit more with the current state, to make sure I don t waste your and Petteris time. The only
            Message 5 of 14 , Jun 5, 2013
            • 0 Attachment
              Greg,

              Before I'll contribute more feedback I'd like to test a bit more with the current state, to make sure I don't waste your and Petteris time. The only comment I have right now is on CONFIG_STM32_CCMEXCLUDE - this is a nice 'quick fix' for OS-level development, but if you really have an application where you can leverage NuttX AND the F4 core (like we do), killing almost half of the available SRAM really hurts. Having the CCM RAM enabled is not really a special case.

              When I enable FAT DMA allocation, I get fat_dma_alloc/free as undefined symbol. If I search the NuttX tree, this function is for no architecture or board implemented - I would assume that you have prototyped it at least for one platform?

              If not, is there some other work I can base on, or do I need to provide these functions as custom code?

              Thanks!

              -Lorenz



              On Jun 4, 2013, at 7:36 PM, Gregory N <spudarnia@...<mailto:spudarnia@...>> wrote:



              Hi, Petteri,

              Sorry to be slow to respond. I have been suffering from ISP problems this morning. I have checked in all of the changes will the following differences:

              1) 0003: Can't the redraw() method in inline?
              2) 0004-0005: I added a setting CONFIG_STM32_DMACAPABLE to enable or disable the feature

              Petteri, Lorenz,

              > Sorry, I wasn't clear: I'm aware that it won't work from CCM RAM, and I was already trying to plan for a fix for that. Having it at least not failing is certainly better, and so the patch is in any case worthwhile. It just seems that putting an end to the allocation fragility altogether is quite within reach (for you, I admit that it would take me time to get into it and to get it right).
              > ...

              I think Petteri's patch looks pretty good although I would did make it a configuration option. Only the STM32 F4 has CCM memory and to the other platforms should not need this change (although I have experienced problems DMAing with FSMC SRAM in one direction on the F1). For some of the tiny STM32 platforms, we really have to be careful.

              If there is a common solution, I would like to understand what that solution is. There are a number of 'ad hoc' things being done now:

              1. For the F4, there is an option to exclude CCM memory from the heap(CONFIG_STM32_CCMEXCLUDE). If you have other uses for the CCM memory then that is the best solution.

              2. Some drivers allocate their own DMA buffers and have special callouts to assure that they allocate from DMA memory. The idea is that you can provide a special DMA memory allocator to provide the right kind of memory. The following drivers take this approach:

              CONFIG_FAT_DMAMEMORY - This will cause the FAT file system to allocate its internal, sector I/O buffers using a special DMA buffer allocator.

              CONFIG_USBDEV_DMAMEMORY - This will cause *all* device controller drivers to allocate memory using the special DMA buffer allocator.

              In mm/, I have implemented a special "granule" allocator which is intended to allocate aligned buffers from special memory pools. This is the recommended way to implement DMA allocators. The current malloc logic is capable is supporting allocations from different heaps as well and could also be used for that purpose as well using less FLASH memory.

              3. The LPC17xx and others have this same DMA memory issue. Usually it is opposite problem: Only one memory region is capable of DMA and the others are not. But I would not recommend what I did in that case which was to hard-code, pre-allocated DMA buffers in this special memory. I wish I had used a DMA allocator for this platform.

              In my opinion, special casing addresses and copy from non-DMA to DMA buffers feels kludgey (I know that there is some driver that copies between buffers now, but I can remember which).

              So I think that in the longer run, the solution is to support memory pools and special allocators to get memory from the different memory pools. A DMA memory pool is one example. The only issue is that malloc() does not support the additional parameters needed to specify the memory pool. Inside of NuttX, that is not an issue because the OS has its own internal allocators and they could be extended with some special arguments to select memory pools.

              The SPI case is different, however, because as Petteri notes, you very often create a tiny buffer of a few bytes on the stack and do the SPI I/O through those buffers. Another way that this could be solved would be assure that stacks are not allocated from CCM memory, another form of memory pool.

              Non-DMA SPI will not be a random occurrence; if the task that performs SPI I/O has its stack in CCM SRAM, then DMA will never be performed on those transfers. The only way to avoid that problem would be to allocate the stack in non-CCM memory using a special, OS memory allocator.

              Petteri has his solution in hand now. I am open to discussion or recommendations on the best approach going forward.

              Greg
            • Gregory N
              Hi, Lorenz, ... No, but other people have used it. I don t plan to standardize the allocator because it is platform specific and so tied to configurations and
              Message 6 of 14 , Jun 5, 2013
              • 0 Attachment
                Hi, Lorenz,

                > Before I'll contribute more feedback I'd like to test a bit more with the current state, to make sure I don't waste your and Petteris time. The only comment I have right now is on CONFIG_STM32_CCMEXCLUDE - this is a nice 'quick fix' for OS-level development, but if you really have an application where you can leverage NuttX AND the F4 core (like we do), killing almost half of the available SRAM really hurts. Having the CCM RAM enabled is not really a special case.
                >
                > When I enable FAT DMA allocation, I get fat_dma_alloc/free as undefined symbol. If I search the NuttX tree, this function is for no architecture or board implemented - I would assume that you have prototyped it at least for one platform?

                No, but other people have used it. I don't plan to standardize the allocator because it is platform specific and so tied to configurations and product-specific resource usage.

                > If not, is there some other work I can base on, or do I need to provide these functions as custom code?

                You need to provide this function in your custom code.

                It was designed to mate with the granule allocator in mm/ described in http://sourceforge.net/p/nuttx/git/ci/master/tree/nuttx/mm/README.txt with interfaces prototyped in include/nuttx/gran.h. The granule allocator has advantages in that it provides naturally aligned memory and if the granule size is picked properly, it is a little more immune to fragmentation.

                Probably a lower footprint way is to create another instance of the HEAP allocator. Hmmm... I don't think documented how to do that but it will involve defining CONFIG_MM_MULTIHEAP=y and using the interfaces prototyped in include/nuttx/mm.h.

                Use mm_initialize() to initialize an mm_heap_s structure. Then call mm_malloc(), mm_realloc(), mm_free(), etc. passing it the heap structure. The standard malloc(), realloc(), free() use this same mechanism, but with a global heap structure at g_mmheap.

                I will update the mm/README.txt to discuss this a little more.

                Greg
              • petteriaimonen
                Hi, ... The best workaround with the current code I have found is to have thread stacks in CCM, and everything else in normal RAM. This way there is no need to
                Message 7 of 14 , Jun 6, 2013
                • 0 Attachment
                  Hi,

                  > Before I'll contribute more feedback I'd like to test a bit more with the current state, to make sure I don't waste your and Petteris time. The only comment I have right now is on CONFIG_STM32_CCMEXCLUDE - this is a nice 'quick fix' for OS-level development, but if you really have an application where you can leverage NuttX AND the F4 core (like we do), killing almost half of the available SRAM really hurts. Having the CCM RAM enabled is not really a special case.

                  The best workaround with the current code I have found is to have thread stacks in CCM, and everything else in normal RAM. This way there is no need to mess with the DMA allocators for FAT. (One big problem with the FAT DMA allocator thing is that it will still sometimes pass directly through the buffer you are passing, if it is exactly a multiple of 512 bytes.)

                  To accomplish this, I have done the following:

                  1) Set config to:
                  CONFIG_STM32_CCMEXCLUDE=y
                  CONFIG_MM_MULTIHEAP=y

                  2) Then have a separate heap for CCM, and replace up_create_stack() so that it allocates from there. This can be done by including this file in board configuration:
                  http://koti.kapsi.fi/jpa/stuff/other/up_ccmstack.c

                  --
                  Petteri
                • Gregory N
                  Hi, Petteri, ... I think I am missing something. Your logic looks like it will prefer the CCM memory when allocating stacks. Is that what you want to do? I
                  Message 8 of 14 , Jun 6, 2013
                  • 0 Attachment
                    Hi, Petteri,

                    > > Before I'll contribute more feedback I'd like to test a bit more with the current state, to make sure I don't waste your and Petteris time. The only comment I have right now is on CONFIG_STM32_CCMEXCLUDE - this is a nice 'quick fix' for OS-level development, but if you really have an application where you can leverage NuttX AND the F4 core (like we do), killing almost half of the available SRAM really hurts. Having the CCM RAM enabled is not really a special case.
                    >
                    > The best workaround with the current code I have found is to have thread stacks in CCM, and everything else in normal RAM. This way there is no need to mess with the DMA allocators for FAT. (One big problem with the FAT DMA allocator thing is that it will still sometimes pass directly through the buffer you are passing, if it is exactly a multiple of 512 bytes.)
                    >
                    > To accomplish this, I have done the following:
                    >
                    > 1) Set config to:
                    > CONFIG_STM32_CCMEXCLUDE=y
                    > CONFIG_MM_MULTIHEAP=y
                    >
                    > 2) Then have a separate heap for CCM, and replace up_create_stack() so that it allocates from there. This can be done by including this file in board configuration:
                    > http://koti.kapsi.fi/jpa/stuff/other/up_ccmstack.c

                    I think I am missing something. Your logic looks like it will prefer the CCM memory when allocating stacks. Is that what you want to do? I thought you were trying to get the stacks out of CCM memory so that you can DMA from the stack?

                    Here is another way I though about the problem:

                    1) Create two heaps. One that allocates from DMA-able memory. Let's we can allocate using dma_alloc().

                    2) A second that allocates for CCM memory. Let's say ccm_alloc().

                    You did these same things, expect that the first was called malloc() and the second ccm_alloc().

                    3) Then replace malloc() with logic like:

                    void *malloc(size_t size)
                    {
                    void *ret = ccm_alloc(size);
                    if (!ret)
                    {
                    ret = mm_dmaalloc(size);
                    }
                    return ret;
                    }

                    In that way malloc() will prefer the non-DMA memory. Then for general allocations, it will hand out non-DMA memory if it can and DMA memory if it cannot.

                    If you want dma memory, then you can call dma_alloc(). I would think that up_stack_create() would want DMA-able memory. But maybe I do not fully understand.

                    Greg
                  • petteriaimonen
                    Hi, ... No. I want the heap out of CCM so that I can DMA from any malloc()ed buffer. Then I want the stacks to CCM so that I have more free heap. Buffers are
                    Message 9 of 14 , Jun 7, 2013
                    • 0 Attachment
                      Hi,

                      > I thought you were trying to get the stacks out of CCM memory so that you can DMA from the stack?

                      No. I want the heap out of CCM so that I can DMA from any malloc()ed buffer. Then I want the stacks to CCM so that I have more free heap. Buffers are quite rarely allocated on stack, except for those small SPI command buffers.

                      --
                      Petteri
                    • Meier Lorenz
                      Hi, I played yesterday evening a bit with it, and I needed to disable DMA for SPI1, but enable it for SPI3. Since I set out for a quick test, I disabled CCM
                      Message 10 of 14 , Jun 7, 2013
                      • 0 Attachment
                        Hi,

                        I played yesterday evening a bit with it, and I needed to disable DMA for SPI1, but enable it for SPI3. Since I set out for a quick test, I disabled CCM RAM and then added support to disable DMA per-bus. I bet Petteri has a similar setup, and the way he allocates stacks buys him the same behavior (no DMA for sensors / small transfers), but DMA for the microSD card.

                        To wrap this up, I think what we really want is:

                        1) Greg, I do deeply respect your architectural skills, but I'm still not convinced that the right way to attack this problem is to force the user-space code to allocate their buffers in the right memory. I will explain in a minute why this won't work for us, but my main argument is that you're partially ruining all the hard work you've put into hardware abstraction to save some buffer space and a call to memcpy(). Obviously that should only be done if required.

                        2) We don't want to use DMA for all transfer sizes. Setting up a DMA transfer for a few bytes just spoils the purpose and makes things slower.


                        Now let me quickly give you some background to my use case:

                        - We're using the full F4 internal SRAM. We can't split stack or heap usage between the two RAM regions, because we don't have a default set of applications, but rather operate NuttX like a real Unix, and so we'll typically have 5-10 apps running on their own stack or heap allocations, using up 50-80% of the RAM. Because our apps only partially overlap between systems / boots (fixed wing aircraft vs. multirotors vs. bench testing), we can't allocate their internal buffers statically, and because its a larger development community with limited embedded (and more aeronautical) knowledge, we can't really enforce how they allocate their memory. We also have to assume they don't know there is a dma allocator.

                        - We're running SPI calls in interrupt context in our drivers (triggered by a timer interrupt), and most of our transfers are really tiny (6 bytes), but happen at 400-800 Hz. We don't want DMA for those.

                        - We have large buffers for an ultra-low-priority logging app, which only runs when the rest of the flight control code is idle. This is the only component (with its own bus) which should transfer in DMA mode.

                        - We have a reasonable number of processes that write files to the microSD. All those should use DMA, but not require knowledge about the DMA constraints from the application developer.


                        Interestingly, any "Unix" use case for NuttX will be similar - some sensors or touchpad controllers with small, quick SPI transfers, some larger SPI transfers (displays, storage) and no possibility to statically divide the memory.


                        Consequently, I think Petteri's last proposal was about perfect, and I have only very minor additions:

                        - Let NuttX for the STM32 default to dma_capable() on. Let it check: 1) memory region, 2) transfer size, 3) interrupt context. If it reports a large transfer, no interrupt context, but the wrong memory type, try to malloc() a buffer, using the allocator Greg discussed, if a tradeoff transfer size is met (e.g. > 100 bytes).
                        - Allow to completely disable DMA per-bus (I would have a patch for that).
                        - Remove all special DMA handling in higher level drivers (e.g. FAT)
                        - Enable CCM as default (its not right now)


                        I think this would substantially simplify the average DMA setup on the F4 SPI (I would recommend the same strategy for SDIO), it would bring back almost half of its RAM (which is currently per default disabled) and it would prevent that users need to scatter special dma allocation calls on the top-level of the OS.

                        I admit it would come at the expense of a memcpy() operation, but in particular when the ARMv7-M custom version is enabled, this is really a tiny overhead.

                        My main concern is still that this does enforce the POSIX development model and keeps the NuttX API clean and the required code changes are only within the STM32 drivers (where they belong).

                        -Lorenz


                        ------------------------------------------------------
                        Lorenz Meier
                        Institute for Visual Computing
                        ETH Zurich
                        http://www.inf.ethz.ch/personal/lomeier/



                        On Jun 7, 2013, at 1:22 PM, petteriaimonen <petteri.aimonen@...<mailto:petteri.aimonen@...>> wrote:



                        Hi,

                        > I thought you were trying to get the stacks out of CCM memory so that you can DMA from the stack?

                        No. I want the heap out of CCM so that I can DMA from any malloc()ed buffer. Then I want the stacks to CCM so that I have more free heap. Buffers are quite rarely allocated on stack, except for those small SPI command buffers.

                        --
                        Petteri
                      • Gregory N
                        Lorenz, Petteri, ... If someone wants to work up a patch and send it to me, I will review and incorporate it. I would consider Petteri s changes to be the
                        Message 11 of 14 , Jun 7, 2013
                        • 0 Attachment
                          Lorenz, Petteri,

                          > ... snip ...
                          > Consequently, I think Petteri's last proposal was about perfect, and I have only very minor additions:
                          >
                          > - Let NuttX for the STM32 default to dma_capable() on. Let it check: 1) memory region, 2) transfer size, 3) interrupt context. If it reports a large transfer, no interrupt context, but the wrong memory type, try to malloc() a buffer, using the allocator Greg discussed, if a tradeoff transfer size is met (e.g. > 100 bytes).
                          > - Allow to completely disable DMA per-bus (I would have a patch for that).
                          > - Remove all special DMA handling in higher level drivers (e.g. FAT)
                          > - Enable CCM as default (its not right now)

                          If someone wants to work up a patch and send it to me, I will review and incorporate it. I would consider Petteri's changes to be the perfect proof-of-concept, but more would have to be done before I could incorporate the changes. Some general comments, issues, guidelines, requirement, ...:

                          - Any new configurations should default to no changes. I don't want new features to be enabled by default on 'make oldconfig'. If someone wants to use a new feature it should be explicitly enabled and never enabled by default.

                          - No changes to the higher level drivers. Those have scope bigger than STM32. FAT is the only higher level driver affected (the STM32 USB device driver is another driver that has DMA allocators, but it is not 'higher'). I do not want to remove that logic because it is still needed on other platforms. It will cause you no problems because that logic is only enabled if you do so in the configuration.

                          - Anything to do with CCM must be constrained to the arch/arm/src/stm32 directory and must be conditioned on STM32 F4. Generalizing the capability by enabling it with CONFIG_ARCH_STACK_ALLOCATOR might be better. Then it is a general concept that makes sense to the rest of the world and maps to the functionality that you want. The fact that it allocates from CCM memory is purely STM32 F4 internal knowledge.

                          So the allocator itself would need to reside in arch/arm/src/stm32

                          - The allocator name should also be general. Like up_stackalloc(). Then the generic stack creation in logic in arch/arm/src/common can use it.

                          Hmmm... perhaps there is a better name since you might want to use it for other things as well????

                          - The stack for the NULL task would not lie in CCM memory, but that should not be issue because it is never allocated, never freed, and the NULL task probably will never do DMA.

                          Greg
                        • Michael Smith
                          This will still give you grief when someone tries to read/write a local buffer on the stack.
                          Message 12 of 14 , Jun 8, 2013
                          • 0 Attachment

                            This will still give you grief when someone tries to read/write a local buffer on the stack.

                            On Jun 6, 2013, at 2:47 AM, petteriaimonen <petteri.aimonen@...> wrote:

                             

                            Hi,

                            > Before I'll contribute more feedback I'd like to test a bit more with the current state, to make sure I don't waste your and Petteris time. The only comment I have right now is on CONFIG_STM32_CCMEXCLUDE - this is a nice 'quick fix' for OS-level development, but if you really have an application where you can leverage NuttX AND the F4 core (like we do), killing almost half of the available SRAM really hurts. Having the CCM RAM enabled is not really a special case.

                            The best workaround with the current code I have found is to have thread stacks in CCM, and everything else in normal RAM. This way there is no need to mess with the DMA allocators for FAT. (One big problem with the FAT DMA allocator thing is that it will still sometimes pass directly through the buffer you are passing, if it is exactly a multiple of 512 bytes.)

                            To accomplish this, I have done the following:

                            1) Set config to:
                            CONFIG_STM32_CCMEXCLUDE=y
                            CONFIG_MM_MULTIHEAP=y

                            2) Then have a separate heap for CCM, and replace up_create_stack() so that it allocates from there. This can be done by including this file in board configuration:
                            http://koti.kapsi.fi/jpa/stuff/other/up_ccmstack.c

                            --
                            Petteri


                          Your message has been successfully submitted and would be delivered to recipients shortly.