NetBSD Bans AI-Generated Code

erek · May 17, 2024

"NetBSD committers are now banned from using any AI-generated code from ChatGPT, CoPilot, or other AI tools. Time will tell how this plays out with both their users and core team."If you commit code that was not written by yourself, double check that the license on that code permits import into the NetBSD source repository, and permits free distribution," reads NetBSD's updated commit guidelines. "Check with the author(s) of the code, make sure that they were the sole author of the code and verify with them that they did not copy any other code. Code generated by a large language model or similar technology, such as GitHub/Microsoft's Copilot, OpenAI's ChatGPT, or Facebook/Meta's Code Llama, is presumed to be tainted code, and must not be committed without prior written approval by core.""

Source: https://tech.slashdot.org/story/24/05/17/007240/netbsd-bans-ai-generated-code

Okatis · May 17, 2024

erek said:
and verify with them that they did not copy any other code

We're at the stage of pinky swears I guess.

uOpt · May 17, 2024

Good move. If you accept AI code for a couple years and courts then decide that machine learning training on other code mixes up those licenses you would be screwed.

Keep in mind that NetBSD is not GPLed but that coding AI is usually trained on code under licenses including the GPL.

d3athf1sh · May 17, 2024

how do they know the difference?

uOpt · May 17, 2024

d3athf1sh said:
how do they know the difference?

They have a close-knit community of approved committers which can be relied on to play by the rules.

I also think that AI code generation will not be very effective in operating system code in the first place.

cjcox · May 17, 2024

llama3 says (there are humorous parts):

>>> How can I know if code I'm using was AI generated?

A great question!

Determining whether code is AI-generated or not can be challenging, but here are some potential red flags
and indicators that might suggest AI-generated code:

1. **Unnatural syntax**: AI models might generate code with unusual syntax, such as inconsistent
indentation, unorthodox variable names, or unconventional coding styles.

2. **Lack of comments**: AI-generated code often lacks comments, documentation, or explanations, which is
uncommon in human-written code.

3. **Inconsistent naming conventions**: AI models may use random or arbitrary naming conventions for
variables, functions, or classes, unlike the consistent naming schemes humans typically follow.

4. **Unusual data structures**: AI-generated code might employ unusual data structures, such as complex
nested objects or arrays, which are less common in human-written code.

5. **Overuse of abstract concepts**: AI models may rely heavily on abstract concepts like interfaces,
generics, or lambda expressions, which can be indicative of machine-generated code.

6. **Unnatural function names**: Function names that seem arbitrary, unusual, or unrelated to the
functionality they implement might suggest AI generation.

7. **Lack of testing or debugging**: AI-generated code often lacks thorough testing and debugging, as
these processes require human intuition and understanding.

8. **Unusual control flow**: Code with unexpected control flows, such as nested loops or conditional
statements that seem overly complex, could be a sign of AI generation.

9. **Uncommon libraries or frameworks**: AI models might use unfamiliar or obscure libraries or
frameworks, which can indicate machine-generated code.

10. **Code smells**: AI-generated code may exhibit "code smells" like duplicate code, long methods, or
unused variables, which are often indicative of human-written code.

Keep in mind that these indicators are not definitive proof of AI generation, as humans can also write
code with similar characteristics. To confirm, you might need to:

1. **Analyze the code's functionality**: Understand how the code works and whether it makes sense in the
context of the project.

2. **Verify coding standards**: Check if the code adheres to established coding standards, such as those
outlined by your organization or industry.

3. **Investigate the author**: If you're familiar with the author or team that wrote the code, their
reputation, and their usual coding style can provide insight into whether the code is human-generated.

If you suspect AI-generated code, it's essential to:

1. **Validate its functionality**: Ensure the code works as intended.

2. **Document and refactor (if necessary)**: Update comments, add tests, and refactor the code if it's
unclear or difficult to maintain.

3. **Report any concerns**: Inform your team or organization about potential AI-generated code,
highlighting the issues you've identified.

Remember that AI-generated code can be useful in certain situations, such as rapid prototyping or
generating boilerplate code. However, it's crucial to understand and verify the code to ensure its
quality, maintainability, and compliance with industry standards.

Armenius · May 17, 2024

But copying someone else's code from a book, Github, or StackOverflow is A-OK.

LukeTbk · May 17, 2024

Armenius said:
But copying someone else's code from a book, Github, or StackOverflow is A-OK.

Only if those step were made: "Check with the author(s) of the code, make sure that they were the sole author of the code and verify with them that they did not copy any other code.

And inference generated code is not banned either according to the very excerpt, only in the thread title.

MrGuvernment · May 17, 2024

LukeTbk said:
Only if those step were made: "Check with the author(s) of the code, make sure that they were the sole author of the code and verify with them that they did not copy any other code.

And inference generated code is not banned either according to the very excerpt, only in the thread title.

This, in the end it comes down to actually vetting the code one is using, which we know far too many "Developers" these days do not do, as noted they literally just copy and paste random code some person on github posted and go on about there day.. then weeks later wonder how their company got compromised, or their local machine got exploited...

uOpt · May 17, 2024

Armenius said:
But copying someone else's code from a book, Github, or StackOverflow is A-OK.

You don't seem to get the point: it depends on the license. If the code has a license that allows such re-use then it is indeed a-ok.

Non-GPL systems also need to make sure they don't get "infected" with GPLed code.

The BSDs are full of code copied from each other. But not from Linux since the Linux kernel is GPLed.

Armenius · May 17, 2024

uOpt said:
You don't seem to get the point: it depends on the license. If the code has a license that allows such re-use then it is indeed a-ok.

Non-GPL systems also need to make sure they don't get "infected" with GPLed code.

The BSDs are full of code copied from each other. But not from Linux since the Linux kernel is GPLed.

Oh, I get it, but do you really think that all contributors are that kosher with their code? And that a project's review process is able to catch it all?

uOpt · May 17, 2024

Armenius said:
Oh, I get it, but do you really think that all contributors are that kosher with their code? And that a project's review process is able to catch it all?

Sure. Foremost because I think that current AI generated code could help very little with operating systems. NetBSD is not some python middleware system that is worked on my people who struggle with the language and need help.

Review is always a tricky matter. If a rouge contributor wants to commit some GPLed code pretending they wrote it themselves that is hard to catch. But the same applies to all software, including commercial applications and enterprise internal systems. Code review won't catch cases of inappropriately licensed code being passed as own code.

uOpt · May 17, 2024

MrGuvernment said:
This, in the end it comes down to actually vetting the code one is using, which we know far too many "Developers" these days do not do, as noted they literally just copy and paste random code some person on github posted and go on about there day.. then weeks later wonder how their company got compromised, or their local machine got exploited...

Known security problems all came from using random libraries from the 'net in your project. Not from copying code into your project.

dogDAbone · May 17, 2024

And so it begins....

The Rage against the Machine(s)

But....what will happen when the Machine(s) ban the hoomAns....think about it....

sfsuphysics · May 17, 2024

So what is the legal ramifications of AI generated anything? The company that owns the AI source code owns it? Or the person who entered the parameters?

sfsuphysics · May 17, 2024

dogDAbone said:
And so it begins....

The Rage against the Machine(s)

But....what will happen when the Machine(s) ban the hoomAns....think about it....

Silly hoomans use water for drinking and putting on dirt, we will confiscate it all to keep data centers cool

uOpt · May 17, 2024

sfsuphysics said:
So what is the legal ramifications of AI generated anything? The company that owns the AI source code owns it? Or the person who entered the parameters?

The problem is that the existing code that is used as training data is published under different licenses. So what license is the code under that it puts out?

Nobody knows yet. Courts have not decided.

workshop35 · May 17, 2024

uOpt said:
I also think that AI code generation will not be very effective in operating system code in the first place.

yet

LukeTbk · May 17, 2024

sfsuphysics said:
So what is the legal ramifications of AI generated anything? The company that owns the AI source code owns it? Or the person who entered the parameters?

It can depends on a lot of stuff.

Human reading a lot of code and coming with codes has no legal ramification (sames goes for music, books, tv, etc...), normally inferred from trained material would be quite similar but they could come up with new rules.

Some models can regurgitate 1:1 part of people code (with their names in the comments and everything, they were able to make a model write exact NYTimes articles) for that because code is considered written text by default it is the person who wrote it that own the copyright (if it is done under the context of paid work it will be usually the employer) and can have a say if it is ok to copy it or not, often exprimed in the license they choose but not necessarily.

workshop35 said:
yet

With how clear it can be, the amount of quality of training data and documentations, I imagine that it would already be quite good. Maybe it is purely for show (seem to be the case), but the notice talk to revisit it in the near future, when they use a model specially trained on their on code base or when they will be able to contain your whole code in their active context (2 millions token becoming more common, maybe in one year 20 millions will be a thing, making copilot stuff extremely interesting).

uOpt · May 17, 2024

Is there even an AI currently that can make changes to existing code (as opposed to coming up with new code)? Think of fixing a bug, or adding a feature. How to feed a piece of code into an AI and then describe what the bug is?

OS work is all about bugs and features. You practically never add new programs.

LukeTbk · May 17, 2024

uOpt said:
Is there even an AI currently that can make changes to existing code (as opposed to coming up with new code)? Think of fixing a bug, or adding a feature. How to feed a piece of code into an AI and then describe what the bug is?

They all can do change existing code.

There is many way to feed piece of code to them, if you some made only for that purpose like copilot in visual code/studio/jetbrains/NeoVim, they will pass to the current context of the AI the current tab of the code you are working on and a little bit more (other open tabs, file you suggest to look at), they can have trained on your codebase as well.

If you use general AI, you send your code in their context like anything else (copy paste it, send file if they accept them), you can describe the code, you can copy paste the compiler error, call stack if it is during execution and so on.

If some you can say to them, extract that functionality here in a different function, change the headers and the sources files, they will propose you the change and you can click accept. This is crashing at line 72, why ? and so on.

They are not good/large context enough usually to make anything else than the simplest and small program from scratch at a single go.

socK · May 17, 2024

uOpt said:
Is there even an AI currently that can make changes to existing code (as opposed to coming up with new code)? Think of fixing a bug, or adding a feature. How to feed a piece of code into an AI and then describe what the bug is?

OS work is all about bugs and features. You practically never add new programs.

Yes, they're good at noticing certain things like copy/paste bugs where an unintended variable winds up being referenced or something. Static analyzers can also spot these kinds of bugs, but the latest LLMs are more generic - they can figure that this is likely wrong even in languages with no such tooling.

I'm always a little weary when they think they know better logic wise because they're hit or miss. They don't always have the full context to deduce a sane answer, and the models tend to be too eager to do something over nothing.

So in practice they have a bad habit of opting to nitpick the dumbest shit at a net loss, or being straight up wrong instead of just saying "LGTM"

Edit: although you can kinda goad them into admitting they don't know if you deliberately prompt for it. By default most of them will favor "being helpful" which they love to understand as "die trying to give a definitive answer (even if it's horse shit)."

Axman · May 17, 2024

d3athf1sh said:
how do they know the difference?

It's my understanding that you can search most online AI archives. Now, ML on the desktop? No idea.

StoleMyOwnCar · May 17, 2024

uOpt said:
Is there even an AI currently that can make changes to existing code (as opposed to coming up with new code)? Think of fixing a bug, or adding a feature. How to feed a piece of code into an AI and then describe what the bug is?

OS work is all about bugs and features. You practically never add new programs.

For context, my friend was able to get through a grad level Computer Science class via relying on ChatGPT.

Someday the only programmers will be the ones being fed into the LLMs comprising these tools, and then the people who improve the tools themselves.

I know that sounds scary, but after the shit I've seen people write at work, it might be fine.

uOpt · May 17, 2024

I better get cracking on toying more with that ML framework I like...

erek · May 18, 2024

dogDAbone said:
And so it begins....

The Rage against the Machine(s)

But....what will happen when the Machine(s) ban the hoomAns....think about it....

"Gentoo and NetBSD ban 'AI' code, but Debian doesn't – yet" https://www.theregister.com/2024/05/18/distros_ai_code/

nthexwn · May 18, 2024

This is impossible to enforce.

erek · May 18, 2024

nthexwn said:
This is impossible to enforce.

NetBSD Bans AI-Generated Code

[H]F Junkie

Limp Gawd

[H]ard|Gawd

[H]ard|Gawd

[H]ard|Gawd

2[H]4U

Extremely [H]

Supreme [H]ardness

Fully [H]

[H]ard|Gawd

Extremely [H]

[H]ard|Gawd

[H]ard|Gawd

Gawd

[H]F Junkie

[H]F Junkie

[H]ard|Gawd

Gawd

Supreme [H]ardness

[H]ard|Gawd

Supreme [H]ardness

Supreme [H]ardness

VP of Extreme Liberty

2[H]4U

[H]ard|Gawd

[H]F Junkie

Weaksauce

[H]F Junkie