francescomecca.eu/output/categories/python.xml
Francesco Mecca 2fc0ad5c9f new cv
2020-01-29 11:08:46 +01:00

132 lines
No EOL
18 KiB
XML

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Caught in the Net (Posts about python)</title><link>francescomecca.eu</link><description></description><atom:link href="francescomecca.eu/categories/python.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2020 &lt;a href="mailto:francescomecca.eu"&gt;Francesco Mecca&lt;/a&gt; </copyright><lastBuildDate>Wed, 29 Jan 2020 10:04:36 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Interpolation using a genetic algorithm</title><link>francescomecca.eu/blog/2016/5/15/genetic-alg/</link><dc:creator>Francesco Mecca</dc:creator><description>&lt;div&gt;&lt;p&gt;This weekend I was in Milan to get a visa and I had the opportunity to work with a friend, Michele, on genetic algorithms.
It was the first time I dig up in such field and it was very exciting.
In this post I want to explain some bits of our work.&lt;/p&gt;
&lt;h3&gt;A brief introduction to GA&lt;/h3&gt;
&lt;p&gt;A genetic algorithm is a search/optimization algorithm that uses an heuristic approach to reduce the search space and evolve gradually to a solution.&lt;/p&gt;
&lt;h5&gt;Population&lt;/h5&gt;
&lt;p&gt;It is an algorithm that has its root in the theory of natural selectioni by Charles Darwin.
The main components of a GA are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the population, that concentrate all the available solutions at a given time;&lt;/li&gt;
&lt;li&gt;the fitness function, that gives an approximation of the quality of the solution codified by a given member of the population.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In a GA the first thing to do is to generate a population.&lt;/p&gt;
&lt;p&gt;A population is a group of objects with given attributes, usually a string, and they contains in some form the solution (usually inside a string); the first population is randomly generated and contains a big number of solutions, but not every solution (this is not a bruteforce approach).&lt;/p&gt;
&lt;p&gt;After this step the fitness functions evaluates the quality of every solution that a given member carries: the evaluation should be considered from a bottom up point of view.&lt;/p&gt;
&lt;h5&gt;Reproduction&lt;/h5&gt;
&lt;p&gt;Now, as in Darwin's theory of evolution, the member of the population are going to "reproduce": two members are going to be coupled to generate a new member of the second generation and every child member will contain a solution that is the product of the original genes of their parent members.&lt;/p&gt;
&lt;p&gt;This time the reproduction of the population into a second one is not entirely random. The fitness function gives us an approximation of the quality of every gene that a member carries and by the rule of the "survival by the fittest" the probability that a member is going to reproduce with another one is proportional to the quality of its genes.&lt;/p&gt;
&lt;p&gt;When we have a second generation of members we can recur on our GA and generate a third generation. From this point we can recur until we converge to a solution that is common to every member, or at least that is suited to our needs.&lt;/p&gt;
&lt;h5&gt;Mutation&lt;/h5&gt;
&lt;p&gt;Actually, in some cases, a mutation function can be added, so that, like in real world, some times the genes are "scrambled" indipendently from the fitness function.&lt;/p&gt;
&lt;p&gt;There is more to a GA, for example we could talk about possible ways of storing the genes inside a member or when to use mutation, anyway I want to stop here and continue with an analysis of my problem.&lt;/p&gt;
&lt;h3&gt;Interpolating a function using a GA&lt;/h3&gt;
&lt;p&gt;Me and Michele decided to spend some time developing a little python script to explore GA capabilities and we decided to interpolate some points on a cartesian plane.&lt;/p&gt;
&lt;p&gt;Our program, that is available &lt;a href="http://francescomecca.eu:3000/pesceWanda/interpol_genetica"&gt;here&lt;/a&gt; uses a class to define the various members of the population and a string for the genes, a class as well for the points on the plane.&lt;/p&gt;
&lt;p&gt;The fitness function is not as precise as it should be because this is only a proof of concept:&lt;/p&gt;
&lt;p&gt;.. code:: python&lt;/p&gt;
&lt;pre class="code literal-block"&gt;&lt;span&gt;&lt;/span&gt;mutationProbability = 0.1
rangeLimit = 5
def fitness(item, pointList, n):
value = 0
for p in pointList:
y = 0
for i in range(n):
y += item.gene[i] * pow(p.x, i)
result = 1 - (abs (p.y - y) / rangeLimit)
if result &amp;lt; 0:
result = 0
value += result
return value / n
&lt;/pre&gt;
&lt;p&gt;item is just a member of the population, poinList is the list of points and n is the number of points (n - 1 is the grade of the function).&lt;/p&gt;
&lt;pre class="code literal-block"&gt;&lt;span&gt;&lt;/span&gt;for i in range(n):
y += item.gene[i] * pow(p.x, i)
&lt;/pre&gt;
&lt;p&gt;this piece of code gives us the value of the function encoded in the genes in the points of pointList;&lt;/p&gt;
&lt;pre class="code literal-block"&gt;&lt;span&gt;&lt;/span&gt;result = 1 - (abs (p.y - y) / rangeLimit)
if result &amp;lt; 0:
result = 0
&lt;/pre&gt;
&lt;p&gt;while here the script stores 1 - the previous result because if the GA has yield a good result there should be distance = 0 from the function evaluated and the points; If this is the case, the fitness function should attribute the highest possible reproduction probability for that member.
At the end the fitness function returns the total value over the number of points evaluated.&lt;/p&gt;
&lt;p&gt;As you can see this fitness function is by no means an optimal one. The reproduction probability is higher for functions that crosses some points and are really distant from others rather than for functions that are closer to every point but crosses none.
Anyway for simple cases the GA yields good results, as an example for points (0 0), (1 4), (2 9) one of the member with the highest reproduction probability has this function in its genes:&lt;/p&gt;
&lt;pre class="code literal-block"&gt;&lt;span&gt;&lt;/span&gt;-0.0487839869993989 * x^0 + 4.600339125358671 * x^1 + -0.2780958075230644 * x^2
&lt;/pre&gt;
&lt;p&gt;that crosses this points: (0 -0.0488), (1 4.2735), (2 8.0395) given 80 iterations, initial population of 600 members and a two digit approximation.&lt;/p&gt;
&lt;p&gt;For a more precise computation a higher population size and a really high number of iterations should be used.&lt;/p&gt;&lt;/div&gt;</description><category>AI</category><category>Genetic algorithm</category><category>PesceWanda</category><category>programming</category><category>python</category><guid>francescomecca.eu/blog/2016/5/15/genetic-alg/</guid><pubDate>Sun, 15 May 2016 00:00:00 GMT</pubDate></item><item><title>Kyuss Music Player</title><link>francescomecca.eu/blog/2016/4/17/kpd-player/</link><dc:creator>Francesco Mecca</dc:creator><description>&lt;div&gt;&lt;p&gt;For a long time I have been using Clementine music player on my workstation. Recently I reinstalled Gentoo on my desktop and I wanted to avoid installing QT libraries of any sort.
So I switched to &lt;a href="https://www.musicpd.org/"&gt;mpd&lt;/a&gt; and I have fallen in love with it. It is very flexible, fast and enriched by a lot of community software.
For some weeks I used mpc client as my primary client for mpd but I was not satisfied with it. Even though it is pretty minimal but packed with every feature mpd permits, the search feels uncomfortable because is case sensitive and need artist, album, etc. flags before any entry.
This is why I have written kpd together with Francesco Gallà&lt;/p&gt;
&lt;h3&gt;Kyuss Player Client&lt;/h3&gt;
&lt;p&gt;kpd is an acronym for Kyuss Player Client because we have been listening only to &lt;a href="https://en.wikipedia.org/wiki/Kyuss"&gt;Kyuss&lt;/a&gt; while programming this client.
We have reimplemented the search functions to suit our habits. No more case sensitive, optional 'artist, album, title' flags.
kpd accepts only one string as the search argument and implements optional filter arguments to narrow the search in a grep like way.
I welcome you to read the &lt;a href="http://francescomecca.eu:3000/pesceWanda/kpd"&gt;readme&lt;/a&gt; in my git to understand how the search works.
Anyway in this post I want to explain bits of the code.&lt;/p&gt;
&lt;h4&gt;Main&lt;/h4&gt;
&lt;p&gt;The main kpd file invoked when the command is run in the console is kpd.py
The most interesting part in this file IMHO is these lines:&lt;/p&gt;
&lt;p&gt;.. code:: python&lt;/p&gt;
&lt;pre class="code literal-block"&gt;&lt;span&gt;&lt;/span&gt; for el in argsOrder:
if dictArgs[el] != False:
client.update_status ()
methodToCall = getattr (util, el)
retUtil = methodToCall (client, dictArgs[el], searchRes)
&lt;/pre&gt;
&lt;p&gt;argsOrder is a list of the arguments on the command line in the order the user wrote them.
kpd uses a dictionary to store for every argument the corrispective string for the function that will be invoked using getattr.
In this way any argument can be added to the main file without writing any other line of code. WE used this method to avoid using switch alike solutions.&lt;/p&gt;
&lt;h4&gt;Util&lt;/h4&gt;
&lt;p&gt;The util.py source file is a pretty easy source file to read. It contains every function that can be invoked by command line arguments. Every function has the same 'prototypes' so that they can be called using the method explained above.
To implement &lt;code&gt;no-output&lt;/code&gt; and &lt;code&gt;output&lt;/code&gt; function I have used a class:
to suppress the output on the console the program assign to &lt;em&gt;sys.stdout&lt;/em&gt; a dummy class that save the original stdout on a variable and replaces write and flush functions so that they are just pass. and no output is written.
To permit output after suppression the program just reassing the original value to sys.stdout.&lt;/p&gt;
&lt;h4&gt;Database Search&lt;/h4&gt;
&lt;p&gt;In MPDdatabase.py we have written the search functions.
Originally we intended to just read and import in a dictionary the whole mpd database that is stored compressed in the home directory.
This list of dictionaries stores every entry related to the song and if any of them matches the search string or the filter string (considering also flags if any) the related song is printed on the output and saved in a list so it can be added by the add function.
This approach result very efficent in term of precision but it lacked speed. For a database of about 77 thousand songs (about 550k lines) a search query could last almost 2 seconds.
To improve the speed of the search we used the pickle module. The pickle module allows kpd to dump the data structure used to store the database in memory on a file that can be read easily by using the &lt;code&gt;pickle.load&lt;/code&gt; function.
In this way the search lasts about 40 milliseconds on the same database that wastes about 16MiB of memory on disk.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;This was really fun. It was our first hand on python project and the first real program we have written since we started learning programming at our university.
I discovered that programming allows me to relax and that is really cool to have custom software for activities you do every day.
The source for our program is stored in my git &lt;a href="http://francescomecca.eu:3000/pesceWanda/kpd"&gt;here&lt;/a&gt; and you are free to modify it.&lt;/p&gt;&lt;/div&gt;</description><category>mpd</category><category>music player</category><category>PesceWanda</category><category>programming</category><category>python</category><guid>francescomecca.eu/blog/2016/4/17/kpd-player/</guid><pubDate>Sun, 17 Apr 2016 00:00:00 GMT</pubDate></item><item><title>The Buridan's donkey in python</title><link>francescomecca.eu/blog/2016/4/2/buridan_donkey/</link><dc:creator>Francesco Mecca</dc:creator><description>&lt;div&gt;&lt;p&gt;During the final weeks of my exam session I started reading a bit about python 3 using an excellent book: &lt;a href="http://www.diveintopython.net/"&gt;Dive into Python&lt;/a&gt;.
When I noted that python uses the &lt;a href="https://en.wikipedia.org/wiki/Mersenne_Twister"&gt;Mersenne Twister PRNG&lt;/a&gt; as well I decided to write another version of my &lt;a href="http://francescomecca.eu/index.php/archives/207"&gt;Buridan's donkey program&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;.. code:: python&lt;/p&gt;
&lt;pre class="code literal-block"&gt;&lt;span&gt;&lt;/span&gt; &lt;span class="s s-Atom"&gt;import&lt;/span&gt; &lt;span class="s s-Atom"&gt;random&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s s-Atom"&gt;sys&lt;/span&gt;
&lt;span class="s s-Atom"&gt;if&lt;/span&gt; &lt;span class="k"&gt;__&lt;/span&gt;&lt;span class="s s-Atom"&gt;name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s s-Atom"&gt;'__main__':&lt;/span&gt;
&lt;span class="s s-Atom"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="s s-Atom"&gt;if&lt;/span&gt; &lt;span class="o"&gt;not&lt;/span&gt; &lt;span class="s s-Atom"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="s s-Atom"&gt;stdin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isatty&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="s s-Atom"&gt;:&lt;/span&gt;
&lt;span class="s s-Atom"&gt;for&lt;/span&gt; &lt;span class="s s-Atom"&gt;line&lt;/span&gt; &lt;span class="s s-Atom"&gt;in&lt;/span&gt; &lt;span class="s s-Atom"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nn"&gt;stdin&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="s s-Atom"&gt;if&lt;/span&gt; &lt;span class="s s-Atom"&gt;line&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;is&lt;/span&gt; &lt;span class="s s-Atom"&gt;'\n':&lt;/span&gt;
&lt;span class="s s-Atom"&gt;line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s s-Atom"&gt;line&lt;/span&gt;&lt;span class="p"&gt;[:-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="s s-Atom"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s s-Atom"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nn"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="s s-Atom"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s s-Atom"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="s s-Atom"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="s s-Atom"&gt;:&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="s s-Atom"&gt;argRange&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s s-Atom"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="s s-Atom"&gt;for&lt;/span&gt; &lt;span class="s s-Atom"&gt;i&lt;/span&gt; &lt;span class="s s-Atom"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s s-Atom"&gt;argRange&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="s s-Atom"&gt;:&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s s-Atom"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s s-Atom"&gt;'.'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s s-Atom"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s s-Atom"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s s-Atom"&gt;args&lt;/span&gt;&lt;span class="p"&gt;))))&lt;/span&gt;
&lt;/pre&gt;
&lt;p&gt;This script works in a different way than the one in c++.
Rather than shuffling a list made by the entries in the arguments, it pops randomly one entry from the list till the list is empty.&lt;/p&gt;
&lt;p&gt;Not satisfied enough, I wrote also a telegram bot using the &lt;a href="https://github.com/eternnoir/pyTelegramBotAPI"&gt;telebot library&lt;/a&gt; that works as the script above but inside the telegram app.
The bot can be added to your contact list by simply searching for &lt;a href="http://telegram.me/duridan_donkey_bot"&gt;@duridan_donkey_bot&lt;/a&gt; (yes, a typo!)&lt;/p&gt;
&lt;p&gt;All the code is opensource and can be found on my github page.&lt;/p&gt;
&lt;p&gt;Francesco Mecca&lt;/p&gt;&lt;/div&gt;</description><category>buridan donkey</category><category>mersenne twister</category><category>PesceWanda</category><category>python</category><category>random</category><guid>francescomecca.eu/blog/2016/4/2/buridan_donkey/</guid><pubDate>Sat, 02 Apr 2016 00:00:00 GMT</pubDate></item></channel></rss>