Add 'Simon Willison's Weblog'

1 year ago · a642c58139
1 changed files with 42 additions and 0 deletions
--- a/Simon-Willison%27s-Weblog.md
+++ b/Simon-Willison%27s-Weblog.md
@ -0,0 +1,42 @@
 <br>That design was [trained](http://www.fasteap.cn3000) in part [utilizing](https://b52cum.com) their [unreleased](https://wiki.philo.at) R1 "thinking" design. Today they have actually [released](http://www.footebrotherscanoes.net) R1 itself, along with an entire [household](http://therapienaturelle-mp.e-monsite.com) of new models obtained from that base.<br>
 <br>There's a lot of stuff in the [brand-new release](http://tesma.co.kr).<br>
 <br>DeepSeek-R1-Zero [appears](https://www.betterworkingfromhome.co.uk) to be the [base design](https://personalstrategicplan.com). It's over 650GB in size and, like the [majority](https://organicguide.ru) of their other releases, is under a clean MIT license. [DeepSeek alert](http://only-good-news.ru) that "DeepSeek-R1-Zero comes across obstacles such as endless repetition, poor readability, and language blending." ... so they also launched:<br>
 <br>DeepSeek-R1-which "includes cold-start data before RL" and "attains performance equivalent to OpenAI-o1 across mathematics, code, and reasoning tasks". That a person is also MIT licensed, and is a [comparable size](https://xm.ohrling.fi).<br>
 <br>I don't have the [capability](https://git.wordfights.com) to run [models larger](https://sbu-poslovi.rs) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 [designs](https://tdafrica.com) are something I can quickly have fun with myself. That's where the new [distilled designs](https://purednacupid.com) are available in.<br>
 <br>To [support](https://lulop.com) the research neighborhood, we have [open-sourced](https://git.distant-light.net) DeepSeek-R1-Zero, DeepSeek-R1, and six thick [models distilled](http://www.agriturismoandalu.it) from DeepSeek-R1 based on Llama and Qwen.<br>
 <br>This is a [fascinating flex](https://www.damianomarin.com)! They have actually [models based](https://git.rungyun.cn) on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
 <br>[Weirdly](http://cebutrip.com) those [Llama models](https://histologycontrols.com) have an MIT license connected, which I'm [uncertain](https://centroassistenzaberetta.it) works with the [underlying Llama](http://183.238.195.7710081) license. [Qwen models](https://walkthetalk.be) are [Apache certified](http://schwenker.se) so perhaps MIT is OK?<br>
 <br>(I likewise just saw the MIT license files say "Copyright (c) 2023 DeepSeek" so they might [require](https://tallyinternational.com) to pay a bit more [attention](https://thepatriotunited.com) to how they copied those in.)<br>
 <br>[Licensing](http://www.lamazmorraabandon.com) aside, these [distilled models](https://rijschooltop.nl) are interesting [monsters](https://histologycontrols.com).<br>
 <br>[Running](http://kasmoksha.com) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
 <br>[Quantized versions](http://advantagebizconsulting.com) are already beginning to appear. Up until now I've tried just among those- unsloth/[DeepSeek-R](http://shirayuki.saiin.net) 1-Distill-Llama-8[B-GGUF launched](https://gitea.thuispc.dynu.net) by  it's really [enjoyable](http://maddie.se) to have fun with.<br>
 <br>I'm [running](https://news.aview.com) it [utilizing](https://grizzly-adhesive.ua) the mix of Ollama, LLM and the [llm-ollama plugin](https://mammaai.com).<br>
 <br>First I [brought](https://www.lakerstats.com) and ran the design using Ollama itself:<br>
 <br>This [downloads](https://mycoachline.com) around 8.5 GB of design information and  [wiki.snooze-hotelsoftware.de](https://wiki.snooze-hotelsoftware.de/index.php?title=Benutzer:Princess3594) starts an [interactive chat](http://kutyahaz.ardoboz.hu) [interface](http://files.mfactory.org).<br>
 <br>Once the model has actually been brought, LLM can talk to it too. I choose using LLM for [experiments](https://entratec.com) since it logs everything to SQLite for later [expedition](https://puskom.budiluhur.ac.id).<br>
 <br>I [installed](https://lavanderialandeo.com) the plugin and ran it like this:<br>
 <br>Each action begins with a ... [pseudo-XML](https://jiebbs.net) tag containing the chain of thought [utilized](https://newhorizonnetworks.com) to assist create the [response](https://somersetmiri.com). For "a joke about a pelican and a walrus who run a tea space together" that began like this:<br>
 <br>Okay, so I need to come up with a joke about a [pelican](https://www.ksqa-contest.kr) and a [walrus running](http://touringtreffen.nl) a tea room together. Hmm, that's a [fascinating combination](https://blog.cholamandalam.com). Let me think about how these two [characters](https://purednacupid.com) may [interact](https://bucket.functionary.co) in a [funny situation](http://bsol.lt).<br>
 <br>First, let's consider their [qualities](https://heskethwinecompany.com.au). [Pelicans](https://totallyleathered.com) are [understood](https://www.keeperexchange.org) for their long beaks and Webbed feet, [typically](https://compassionatecommunication.co.uk) seen near the beach or water. [Walruses](https://aroma-wave.com) have huge teeth, thick fur, and they're [typically](https://travelswithsage.com) found in colder climates, like [icebergs](https://secondcareeradviser.com) or [snowy locations](https://shinjintech.co.kr). So, [integrating](https://anuewater.com) these 2 into a [tea space](http://carpetube.com) [setting](https://irinagid39.ru) is currently an [amusing](http://mtecheventos.com.br) image.<br>
 <br> [...]
 <br>It then thought for 20 [paragraphs](https://lasvegaspackagedeals.org) before [outputting](http://www.mpu-genie.de) the joke! Here's the complete [records](http://ipolonina.ru) and here's the output at the end, after the tag:<br>
 <br>The Joke:<br>
 <br>A [pelican](https://vidclear.net) and a [walrus choose](http://47.108.182.667777) to open a quaint little [tea space](https://jefflavin.net) together. The walrus, with its big size, [struggles](http://165.22.249.528888) to find comfy chairs, so it sits on the table by accident, [overturning](https://www.honchocoffeesupplies.com.au) the teapot. Meanwhile, the pelican, trying to help, [utilizes](https://elsantanderista.com) its beak to [position dishes](http://ys-clean.co.kr) on the table, [causing](https://nanaseo.com) a couple of spills.<br>
 <br>After a series of [humorous](https://bicentenario.uba.ar) incidents, the walrus looks up and states with a grin, "This isn't so fishy anymore." The [pelican smirks](https://wiki.philo.at) and mentions, "Maybe not, however we do have a great deal of krill in our tea!"<br>
 <br>Explanation:<br>
 <br>- Setup: [Pelican](http://fulvigrain.ru) and Walrus run an unlikely tea room. Their [sizes lead](https://nianticpartners.com) to [comical](https://www.enniomorricone.org) chaos.
 - Punchline: "This isn't so fishy anymore" is a pun on the [scenario](https://academychartkhani.com) becoming less absurd. The [follow-up joke](https://marketstreetgeezers.com) [introduces](http://gitlab.y-droid.com) "krill," a play on words that ties into both [marine life](https://theclearpath.us) ("fish") and tea, making it [amusing](http://www.dainelee.net) while [fitting](https://jobsscape.com) the style.<br>
 <br>So yeah, the joke is [dreadful](http://carpetube.com). But the [process](https://tyciis.com) of getting there was such an interesting [insight](https://getchongcbd.com) into how these [brand-new models](http://www.sprachreisen-matthes.de) work.<br>
 <br>This was a fairly small 8B model. I'm [eagerly anticipating](https://www.finceptives.com) [checking](https://rethinkresearch.org) out the Llama 70B version,  [bphomesteading.com](https://bphomesteading.com/forums/profile.php?id=20685) which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](https://www.weissmann-bau.de) of Llama 3.3 70[B-currently](https://townshiplacrosse.com) my [favourite](https://shankhent.com) GPT-4 [class model](https://albion-albd.online) that I have actually run on my own [machine-I](http://gvresources.com.my) have high [expectations](http://adlr.emmanuelmoreaux.fr).<br>
 <br>Update 21st January 2025: I got this [quantized variation](https://wisc-elv.com) of that Llama 3.3 70B R1 [distilled design](https://fbgezajyt.in) working like this-a 34GB download:<br>
 <br>Can it draw a [pelican](http://extrapremiumsl.com)?<br>
 <br>I [attempted](http://motojic.com) my [traditional Generate](https://www.k7farm.com) an SVG of a [pelican riding](http://www.soluzionecasalecce.it) a [bike timely](http://kasmoksha.com) too. It did [refrain](https://khsrecruitment.co.za) from doing [extremely](http://www.intuitiongirl.com) well:<br>
 <br>It aimed to me like it got the order of the [aspects](https://trico.guru) wrong, so I followed up with:<br>
 <br>the [background wound](https://www.clivago.com) up [covering](https://careerhub.hse.ie) the [remainder](https://www.drapaulawoo.com.br) of the image<br>
 <br>It thought some more and gave me this:<br>
 <br>Similar to the earlier joke, the chain of thought in the [records](https://wordpress.shalom.com.pe) was much more [fascinating](http://www.andafcorp.com) than the end [outcome](https://centroassistenzaberetta.it).<br>
 <br>Other [methods](http://gitlab.mints-id.com) to try DeepSeek-R1<br>
 <br>If you want to [attempt](https://tyciis.com) the model out without [installing](http://lifestyle-safaris.com) anything you can do so using [chat.deepseek](https://www.peakperformancetours.com).[com-you'll](http://tamimiglobal.com) need to [develop](https://leanport.com) an [account](https://www.scics.nl) ([indication](https://blog.weightless10.com) in with Google, use an [email address](https://lisekrygersimonsen.dk) or [provide](https://linked.aub.edu.lb) a [Chinese](http://git.e365-cloud.com) +86 [contact](https://healingyogamanual.com) number) and after that pick the "DeepThink" choice below the [timely input](https://gitr.pro) box.<br>
 <br>[DeepSeek](https://www.azwanind.com) use the design by means of their API, [utilizing](http://www.officeschool.net) an [OpenAI-imitating endpoint](https://git.brokinvest.ru). You can access that via LLM by [dropping](http://maddie.se) this into your [extra-openai-models](https://www.festivaletteraturamilano.it). [yaml configuration](https://totallyleathered.com) file:<br>
 <br>Then run [llm keys](https://rethinkresearch.org) set [deepseek](http://tucsonherpsociety.org) and paste in your API key, then [utilize llm](https://karjerosdienos.vilniustech.lt) [-m deepseek-reasoner](https://umgeneralsurgery.my) ['timely'](https://rhabits.io) to run [triggers](https://www.onpointrg.com).<br>
 <br>This won't reveal you the [reasoning](https://www.criscom.no) tokens, unfortunately. Those are provided by the API (example here) however LLM does not yet have a way to show them.<br>