Asked By Fredo
28-Jul-08 01:14 PM

First of all, I apologize for cross-posting. I posted this on the
framework.interop group as well, but haven't received a response yet, so I
thought I'd try here...
I'm trying to write an app that perfroms some work using the CUDA SDK (for
programming the nVidia GPU).
One of the first methods I wrote is simply to query information about the
CUDA devices (video cards).
The code is very straightforward, but I'm getting an error that indicates
some sort of memory corruption and I'm not really sure how it's happening. I
don't think it's CUDA related, I think it's just something wrong with my
interop calls. Maybe the arrays? The exception I'm getting happens during
the return from GetCUDADeviceProperties (see implementation below and
comment on line that throws exception). It appears to be happening in the
marshalling back of the data. The exception is:
in NeuralNetTimingTest.exe
Additional information: Attempted to read or write protected memory. This is
often an indication that other memory is corrupt."
When I step through the code, GetCUDADeviceProperties() appears to properly
set the fields in the CUDADevice structure.
Here's the relevant code:
From C#:
* The device information structure is this:
[StructLayout(LayoutKind.Sequential)]
public struct CUDADevice
{
public int deviceID;
public int totalMem;
public int numMultiProcessors;
public int numCores;
public int constantMem;
public int sharedMem;
public int registersPerBlock;
public int warpSize;
public int numThreadsPerBlock;
public int[] maxBlockDimensions;
public int[] maxGridDimensions;
}
* The DllImport is:
[DllImport("NeuralNetCUDALib")]
extern static bool GetCUDADeviceProperties(ref CUDADevice device);
* And the C# code that calls it is:
// Allocate the device structure
CUDADevice dev = new CUDADevice();
// Allocate the two arrays
dev.maxBlockDimensions = new int[3];
dev.maxGridDimensions = new int[3];
// Device to query
dev.deviceID = 0;
GetCUDADeviceProperties(ref dev);
* The non-CUDA C++ code is:
* From the .h, the C++ side of the CUDADevice structure
typedef struct tagCUDADevice
{
int deviceID;
int totalMem;
int numMultiProcessors;
int numCores;
int constantMem;
int sharedMem;
int registersPerBlock;
int warpSize;
int numThreadsPerBlock;
int maxBlockDimensions[3];
int maxGridDimensions[3];
} CUDADevice;
extern "C" bool GetCUDADeviceProperties(CUDADevice *pDevice)
{
if (GetDeviceCount() <= 0)
{
return false;
}
if (!QueryDeviceInfo(pDevice->deviceID,
pDevice->totalMem,
pDevice->numMultiProcessors,
pDevice->numCores,
pDevice->constantMem,
pDevice->sharedMem,
pDevice->registersPerBlock,
pDevice->warpSize,
pDevice->numThreadsPerBlock,
pDevice->maxBlockDimensions,
pDevice->maxGridDimensions))
{
return false;
}
return true; // Exception happens during this return!
}
* And the actual CUDA code is:
bool QueryDeviceInfo(int deviceNum,
int &totalMem,
int &numMultiProcessors,
int &numCores,
int &constantMem,
int &sharedMem,
int ®istersPerBlock,
int &warpSize,
int &numThreadsPerBlock,
int *maxBlockDimensions,
int *maxGridDimensions)
{
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, deviceNum);
totalMem = (int) prop.totalGlobalMem;
numMultiProcessors = (int) prop.multiProcessorCount;
numCores = (int) numMultiProcessors * 8;
constantMem = (int) prop.totalConstMem;
sharedMem = (int) prop.sharedMemPerBlock;
registersPerBlock = (int) prop.regsPerBlock;
warpSize = (int) prop.warpSize;
numThreadsPerBlock = (int) prop.maxThreadsPerBlock;
maxBlockDimensions[0] = (int) prop.maxThreadsDim[0];
maxBlockDimensions[1] = (int) prop.maxThreadsDim[1];
maxBlockDimensions[2] = (int) prop.maxThreadsDim[2];
maxGridDimensions[0] = (int) prop.maxGridSize[0];
maxGridDimensions[1] = (int) prop.maxGridSize[1];
maxGridDimensions[2] = (int) prop.maxGridSize[2];
return true;
}