CPU's inherenatly are very capable, so don't offer many limitations to what can be done, so being compatible with them is easy, though certainly still an important consideration.
Because GPU's currently have the best price to giga-flops ratio, (around $0.08 per Gigaflop as of Jan 2015), they are the most important type of device to be compatible with.
Wheras GPU's are very fast, FPGA's are very maleable, in fact the most maleable, so it good to be compatible with them. To that end, we'll seek SystemC compatibility.
However considering that other important technologies are likely going to eventually replace GPU's as price performance leaders, it is good to keep those possiblities in mind.
Quantum Computers is where we are headed, and they offer many new and wonderful opportunities. They also have a unique way of doing computations, which is most easily emulated in current hardware with reversible circuits.
Reversible circuits may be a step that we may experience once three dimensional processors are commonplace, and before Quantum Computer are as reversible cirucits can lower the amount of heat that is generated.
Mutation compatability is also important for generating algorithms automatically. There is a sweet spot between the simplicity of the language, and the functionality that it gives, where it is both easy to generate correct programs, and they have sufficient ability to accomplish a variety of tasks.
Javascript compatability is important because Javascript is the most portable programming language in the world, as it works on all devices enabled with a modern web browser.
One notable limitation of SystemC is a lack of floating-point-numbers. there are only fixed point numbers enabled by default, since floating-point-numbers require a large number of circuits to implement.
This somewhat opens the door to exploration of various kinds of partial number representation schemes. though it is safe to say that we should stick to unsigned ints as much as possible, with signed ones possible if necessary.
Another limitation is that FPGA's don't have true loops, as all loops are unrolled and inlined. So it is necessary to have a limited number of iterations for any loop in order to fit on the hardware.
Thus while and until loops aren't viable, only for loops, with well defined iteration and boundaries, so they can be unrolled.
Somewhat of a feature of FPGA programming is that input and output are expressely defined. This is similar to human languages where the dative is the output.
Fredkin and Toffoli gates are universal gates with which all other gates can be constructed.
Toffoli Gate is a controlled flip of bits, and Fredkin Gate is a control switch of bits. both of them can be supported.
It may also be possible to in source mark something as reversible, in which case could arrive at the input based on the output.
This may save some hassle as if one writes a reversible encoder, then it can likely also decode simply by pushing the bits backwards.
For implementing reversible computing on a functional level it is somewhat similar to dataflow programming, in terms that each function "consumes" the input, however to make it reversible the output has to have enough information to reverse the operations. For-example an addition would modify one of the input values, but leave the other one unmodified, thus making it possible to "undo" the the addition using subtraction.
Brainplus command | meaning | C command | SPEL command |
---|---|---|---|
program start | establishes a modifiable array for processing | char array[0x8000] = {0}; char *ptr=array; | |
> | increment data pointer | ++ptr; | |
< | decrement data pointer | --ptr; | |
+ | increment byte at data pointer | ++*ptr; | |
- | decrement byte at data pointer | --*ptr; | |
[ | if byte at data pointer is zero then jump to matching closing square bracket. | while (*ptr) { | |
] | if byte at data pointer is non-zero, jump to preceding matching opening square bracket. | } | |
. | output byte at data pointer | putchar(*ptr); | |
, | input byte to data pointer | *ptr=getchar(); | |
$ | overwrite byte in storage with byte at pointer|||
! | overwrite byte at pointer with byte in storage|||
0-F | sets value of pointer to multiple of 16 | ||
a-z | call function a-z based on location in code | ||
@ | exit program, or if in function, return to last position in main program and restore state | ||
each warp gets a sentence, or several sentences, those parts of it which are numbers it can solve immediately, those of it which are variables it has to wait on. Once a sentence has been solved, and it's SSA output variable has been filled by a number, other
for purposes of source-code can use compound words for variable names, but for bytecode purposes those compounds will be converted to click-words, so there will only be a single word per variable. There are at least 100's, possibly thousands of SPEL-legal click-words so there should be quite a sufficient amount. though would have to have enough space in order for the full variable to be expressed in the code.
tould also have pointers to the data in the code, though may need something like pointer inactive and pointer active. for instance it could be a variable name when inactive, and a pointer when active.
code segment is a chapter. The idea is that any output to global memory would occur at the end of the chapter, and any input from global memory would occur at the begining of a chapter.
For instance a Reduce instruction, at the begining each warp may load an appropriate variable from global memory. But then some of them will have to wait while others compute earlier results. Then the final sentence would write the result to global memory.
However in a map operation, each warp would write to their corresponding global memory. So begining and end refering to time rather than place.
In any case the idea is that there should be no cross-chapter references, except via global memory input at start, or global memory output at end.
traditionally implemented with jumps for control flow. but can do it by using different variables, ones starting with a lateral click "8", are a loop, and once loop is finished they can be replaced with a dental click "!".
To show that are within a loop, the initial bit of the index can be a 0. With the final line of the loop having an initial bit of 1. Also for blocks which have no dependancies also have an initial index bit of 1.
a 16bit pointer could address the whole shared memory space, and half the constant space. considering some GPU's don't have constant space, or split constant and shared space, it should be sufficient. CPU's have L1 cache, which is usually around 32KB nowadays, so having 16bit addressing should be more than enough.
It may actually be best to limit ourselved to only the first 32KB to fit in the L1 cache, so that would be 16KB of code or around 512 blocks, with up to 16 instructions per block, though more realistically 8. so approximately 4096 instructions or phrases per chapter, average sentence consists of 3 phrases, so around a thousand sentences per chapter.
For human reading, a chapter is best kept between the 5 minute attention span of a toddler and the 20 minute attention span of a focused adult. Often a rule of thumb used is 10 minutes.
Reading speed for most languages is between 155-213 words per minute, and there are up to 15 words per block, so a human could read about 12 blocks a minute, or a 120 blocks in 10 minutes, or 240 blocks in 20 minutes. So would only need 8KB even for a 20 minute chapter.
currently the encoder and it's tests combined come to around 6 thousand words or 31 minute read. though a bunch of those are compound names of variables and functions, and there are several stages in the program. a 16KB code area would allow for 40 minute chapters if necessary.
can have the code held in constant space, allowing for probably even 32KB of code, and simply have a listing of the variables in the shared memory space.
for instance could give a number for the variable, and a length, could also have a bit index at the start of the code in order to more quickly find where the variable is. that way wouldn't have to do search and replace in multiple locations, once the variable is initialized, it is done. there would also have to be a flag to indicate that a variable has been initialized.Admitedly this kind of system would be harder to debug than the afformentioned one -- and would be less elegant. However the aforementioned one would require that the sentence have enough space for the whole variable.
Also the variable lookup table version would mean the code doesn't get modified, as modifying the code can be problematic, though sometimes it is good, such as for reflective metaprogramming.
Each level has more restrictions than the previous level, so one of the most restricted level will work on the previous levels.
The main chapter is the only one that will have input and output. The compiler will complain if the main chapter is too large, and suggest making other chapters.
Also each function has limits also. Ideally no function should comprise no more than 60 rows and 60 columns, with the ideal being closer to 56 on both counts, as that is the most stable isotope of iron, and also how many lines fit on a page, and how many characters per line is conducive to quickest reading. So that a maximum of about 52 blocks per function, though SPEL is highly compressed language, so can round down to 48 blocks, -- which is an even number in dozenal and hexadecimal -- and probably would still have slightly more words than would fit on a piece of paper when translated to just about any language other than SPEL's interlingua.
considering that can fit at least 10 functions or pages in a chapter.
Since except for the main chapter, there is no input or output, there has to be a different way of reporting errors and making assertions. One way, so to have an output variable which will take any error messages, similar to an stderr. These could be channeled up like exceptions until they reach the main chapter.
Exceptions are thrown using apprehensive-mood. try is deliberative-mood. and catch is hypothetical-mood.
Error reporting in this way does have some overhead, so it may be possible to turn off, like assertions, at least for production code.
Also unlike exceptions, it will probably take a bit longer to propogate to the main thread, as there would be a check installed at each output of a function, to see if it has produced an error, if so, then the function itself will exit, forwarding the error, until it is caught somewhere up the stack.
The quote is probably the most complicated part of the encoding. It is best for it to come at the begining of the phrase, that way can know how long the rest of it is, and load it all up, before assigning it to a register designated by the case.
The quote starts with the Quote Flag, which is 30. which takes up 5 bits. The following 3 bits gives the length.
The next bit indicates if it is a signed or unsigned value
The next two bits show width of vector if it is one. otherwise if scalar is point of redundancy and is same width as the width previously pointed to.
the last 4 bits give the named type information
An example is some ASCII text "hello world\n", it's width is 12bytes, closest container is 16 bytes, it is a constant, signed value, vector of 8bits, that is text.
If it was in UTF-8, then it would be an unsigned value. UTF-16 wouldn't fit the whole statement in 16bytes, though would be a text vector made of 16bit width scalars.
Probably one of the main points of this programming language is that can write a program in one language, and it becomes available in all other languages.
Taking the example of gettext, simply by having a different kind of quote, such as one that quotes legal SPEL utterances can be translated automatically based on locality.
Core operations
extended operations depending on domain. also could use other library functions. here are some examples of builtins that are provided my assembly languages:
and others which may be listed on GCC builtins
The idea is to have only around 20 functions.