I fixed the installation process documentation (wiki, blog, readme, etc.)
After some ocland’s bugs fixing, I tested it with the Luxmark benchmark. The reference test case has been the luxball, runing it in the local computer without ocland:
The reference test case is the Luxball, runing without ocland, with the following result:
2 x GeForce GTX 295 = 2627
After that, I launch the ocland server in my laptop, which is connected by the wifi (theorically the worst situation):
2 x GeForce GTX 295 + GeForce 8400M GS (using ocland) = 2935
I had not much time to furthermore test it, but some points should be remarked:
- ocland use all the network capabilities (1.3MB/s through the wifi in my case).
- Is not enough… During the execution up to 13 simultaneous data transferings is runing at the same time, becoming the bottleneck
- For this case the results seems good enough. Nevertheless, can be affected by an artifact (I detected that at the end of the simulation a lot of data is still entering from the server, so essentially is like the laptop device has ran more time than the other devices)
I could not test it in other platforms yet, so I can’t be sure that is a NVidia driver bug, but it seems that…
Using the method clCreateImage2D inside a thread (I used pthreads, of course) the object generated is not accessible (you can’t call clGetMemObjectInfo, clGetImageInfo, clReleaseMemObject, or whatever), inside or outside the thread, receiving a segmentation fault. But if you have generated the object outside the thread, the object can be accessed from the main process, or from any subprocess that you want.
I reported it here:
ocland is designed now as a multithreading server, but I simply moved to a serial server (for the planned number of clients I think that this fact will not be a problem).
If someone could confirm this (In NVidia or in other hardware) would be nice!
Thanks to SourceForge i created a web page for the project:
Why am I working on the web? Simply, I want to launch the first ocland alpha version soon, really soon…
OpenCL allows 2 ways in order to read buffer data, blocking or not blocking. The first way is essentially similar to the method used internally in ocland to exchange data, but the second one, that i called asynchronous buffer reading, is really more complex, but i developed this solution…
- Server and client must exchange all the parameters (command queue, buffer, sizes, …).
- Server must allocate memory and call to clEnqueueReadBuffer, recovering inmediately the control because is a not blocking operation.
- Then the server opens a new socket on a new port, and reports it to the client (the memory transfer done asynchronously can’t interfere with the following client commands).
- Client opens a new connection in the purposed socket.
- Server and client opens a new thread in order to don’t block the execution while data is transfered.
- Server waits until OpenCL reports that memory is available.
- Server sends the data to the client in the new thread using the new connection.
- Server marks the work as done.
In order to the scheme purposed can run, I needed to develope the ocland_event, a layer over the OpenCL cl_event entity. This new layer stores the cl_event associated, and a status flag. Since both cl_event and ocland_event are pointers, the client never will know that transfered event is in reallity an ocland_event, and will use it as a truly cl_event, and when calls to wait over the event, server knows that must wait for the ocland_event, and then for the internally stored cl_event.
There are a little bit complex point on the server side when OpenCL kernel’s arguments are passed, because if the server have platforms that uses OpenCL < 1.2 the type of the argument that client is passing can’t be tested, so is impossible to know if the argument is a memory object (buffers, images, samplers, …).
In OpenCL specification these types of objects are defined as pointers, property used intensively in ocland to speed as much as possible the memory transfers, but has the disadvantage that you must test the pointer in order to know if the pointer is valid. If a client pass an invalid pointer (intentionally or not), ocland is able to test it and return errors in order to avoid a segmentation fault in almost cases.
When an OpenCL instance is passed as kernel argument, ocland can’t test it, and if is a invalid pointer, some OpenCL platforms can crash (due to they are unable to know if the pointer is valid), shutting down the server.
In order to can manage this situation ocland-server now launchs independent threads per each client connected, improving also the speed somehow. In case that a thread crash, server can manage the situation and put as free the slot lost.
Topics to the official first release launch:
- OpenCL specification
- Platform APIs: Done!
- Device APIs: Done!
- Context APIs: Done!
- Command Queue APIs: Done!
- Memory Object APIs: Done!
- Sampler Object APIs: Done!
- Program Object APIs: Done!
- Kernel Object APIs: Done!
- Event Object APIs: Done!
- Enqueued Commands APIs: Pending
- Extensions: Pending
- Deprecated: Pending
- Autotools: For the moment only Codeblocks IDE has been created.
- Git development repository: Done!
- Web media: I created this blog… But for the actual development status is enough