python 删除多个文本里的指定行

比如有多个文本1.txt 2.txt 3.txt 4.txt 5.txt,里面的内容大概是:
12124242----0000
12345678----1111
23456789----2222
34567890----3333
40404044----4444
现在新建个文本0.txt,如果我在文本里写上一行或是多行数据。例如:
12345678
23456789
34567890
就会自动到1.txt 2.txt 3.txt 4.txt 5.txt里面去匹配,如果某个文本跟上面的数据有匹配到的话,就把这一整行都给删除掉,并把删除的内容新建个文档储存。如果没有匹配到的话就忽略,每30秒检测一次。。

因为每次要删除若干行的话,都要每个文本去打开查询删除,太麻烦了,麻烦大虾赐教

第1个回答  2013-03-05
代码基于python 2.6。功能已写成函数,用的简单语法,很好懂。
新文件文件名自动附加"_back"。不懂再问。

import os, time

def readKeys(fileName):
keys = []
f = open(fileName, "r")
while True:
line = f.readline()
if not line:
break
key = line.strip()
if key: keys.append(key)
f.close()
return keys

def processKeys(editFileName, backFileName, keys):
f = open(editFileName, "r")
lines = f.readlines()
f.close()

editLines = []
backLines = []

for line in lines:
found = False
for key in keys:
if line.startswith(key):
backLines.append(line)
found = True
break
if not found:
editLines.append(line)

if backLines:
f = open(editFileName, "w")
f.writelines(editLines)
f.close()
f = open(backFileName, "w")
f.writelines(backLines)
print 'modify',editFileName,'save',backFileName

if __name__ == '__main__':
keys = readKeys("0.txt")
fileList = ["1.txt", "2.txt", "3.txt", "4.txt", "5.txt"]
while True:
for fileName in fileList:
base, ext = os.path.splitext(fileName)
processKeys(fileName, base + "_back" + ext, keys)
print 'sleep 30 seconds'
time.sleep(30)追问

只运行一次就自动关闭了,显示为:

modify 1.txt save 1_back.txt
Traceback (most recent call last):
File "E:\测试\删除.py", line 47, in
processKeys(fileName, base + "_back" + ext, keys)
File "E:\测试\删除.py", line 16, in processKeys
f = open(editFileName, "r")
IOError: [Errno 2] No such file or directory: '5.txt'

而且麻烦大虾给注释一下啦,俺一点基础都没有

追答

你的目录下 没有5.txt吧?

你检查一下1.txt - 4.txt, 应该是已经修改过了。

等下贴注释。

追问

可以了,不过就是储存的文档有点多。1_back.txt,2_back.txt,3_back.txt,4_back.txt。如果我弄到10.txt的话,那不是要建10个新文档储存了,那么样的话太乱了呢。只要把删除的东西放在一个文档就好了。而且里面的30秒更新好像没用。我重新写其他进去,30秒后不会再去操作了

追答

这个代码如果没有找到匹配,不会输出内容。
你试试在30秒等待时修改1.txt,添加匹配的内容。这样30秒后,就会有输出。

下面是改过的代码,保存所有修改内容到一个文件"backup.txt":

超过长度限制,没法贴完整注释。不懂再问。

#coding:gbk
import os, time

def readKeys(fileName):
keys = []
f = open(fileName, "r")
while True:
line = f.readline()
if not line:
break
#去掉文件首尾的空格换行等空白符
key = line.strip()
if key: keys.append(key)
f.close()
return keys

#编辑文件editFileName, 如果同keys中的字符串匹配,那么
#删除这一行,并保存内容到文件backFileName
def processKeys(editFileName, backFileName, keys):
#从editFileName读入全部文件内容到列表
f = open(editFileName, "r")
lines = f.readlines()
f.close()

editLines = []
backLines = []

for line in lines:
found = False
for key in keys:
#如果行以key开头, found置为真, 添加这行到backLines
if line.startswith(key):
backLines.append(line)
found = True
break
if not found:
editLines.append(line)

if backLines:
#保存修改后的内容到editFileName
f = open(editFileName, "w")
f.writelines(editLines)
f.close()
#保存删除的行到backFileName
f = open(backFileName, "w")
f.writelines(backLines)
print 'modify',editFileName,'save',backFileName

if __name__ == '__main__':
#保存匹配字符串的文件名
keys = readKeys("0.txt")
#要编辑的文件名列表
fileList = ["1.txt", "2.txt", "3.txt", "4.txt", "5.txt"]
while True:
for fileName in fileList:
#调用processKeys编辑文件并保存删除的行到备份文件"back.txt"。
processKeys(fileName, "back.txt", keys)
print 'sleep 30 seconds'
#等待30秒
time.sleep(30)

追问

30秒后。我重新修改0.txt里面的内容,但是不会再删除内容了。。只是每隔30秒就输出“sleep 30 seconds”

追答

0.txt也要修改啊,看你的问题是只修改其它文件。幸好写成了函数,只要改动一句,把keys = readKeys移到循环里即可支持。

如果你不懂什么是循环,这样改:

if __name__ == '__main__':
#保存匹配字符串的文件名
keys = readKeys("0.txt")
#要编辑的文件名列表
fileList = ["1.txt", "2.txt", "3.txt", "4.txt", "5.txt"]
while True:
for fileName in fileList:
#调用processKeys编辑文件并保存删除的行到备份文件"back.txt"。
processKeys(fileName, "back.txt", keys)
print 'sleep 30 seconds'
#等待30秒
time.sleep(30)

改为

if __name__ == '__main__':
#要编辑的文件名列表
fileList = ["1.txt", "2.txt", "3.txt", "4.txt", "5.txt"]
while True:
#保存匹配字符串的文件名
keys = readKeys("0.txt")
for fileName in fileList:
#调用processKeys编辑文件并保存删除的行到备份文件"back.txt"。
processKeys(fileName, "back.txt", keys)
print 'sleep 30 seconds'
#等待30秒
time.sleep(30)

追问

就是只修改0.txt,里面有什么数据,就到1234.txt里面去匹配删除的。。我要删除多少数据,就直接把所有的数据,一行一个的写在0.txt就好了。然后运行py就会去1234.txt里面匹配删除的
还有,就是back.txt这个要保存内容可以追加的,不要复写。也就是不要把之前的记录删除掉。还有,就是每次都在屏幕上打印被删除的数据。

追答

追加的确要改。已经改好,并在追加的内容前注明修改的文件名。

import os, time

def readKeys(fileName):
keys = []
f = open(fileName, "r")
while True:
line = f.readline()
if not line:
break
key = line.strip()
if key: keys.append(key)
f.close()
return keys

def processKeys(editFileName, backFileName, keys):
f = open(editFileName, "r")
lines = f.readlines()
f.close()

editLines = []
backLines = []

for line in lines:
found = False
for key in keys:
if line.startswith(key):
backLines.append(line)
found = True
break
if not found:
editLines.append(line)

if backLines:
f = open(editFileName, "w")
f.writelines(editLines)
f.close()
print 'modify',editFileName,'append to',backFileName
f = open(backFileName, "a")
f.write(editFileName + ":" + os.linesep)
f.writelines(backLines)

if __name__ == '__main__':
fileList = ["1.txt", "2.txt", "3.txt", "4.txt"]
while True:
keys = readKeys("0.txt")
for fileName in fileList:
processKeys(fileName, "back.txt", keys)
print 'sleep 30 seconds'
time.sleep(30)

追问

大哥,我说的屏幕打印结果,是要显示提取出来的数据。如果0.txt写入5个号码
123456
234567
345678
456789
567890
其中有3个号码是跟1234.txt里面匹配到的,就把这3个也显示在py的屏幕上。例如:
123456
234567
345678
ok

追答

import os, time

def readKeys(fileName):
keys = []
f = open(fileName, "r")
while True:
line = f.readline()
if not line:
break
key = line.strip()
if key: keys.append(key)
f.close()
return keys

def processKeys(editFileName, backFileName, keys):
f = open(editFileName, "r")
lines = f.readlines()
f.close()

editLines = []
backLines = []

for line in lines:
found = False
for key in keys:
if line.startswith(key):
backLines.append(line)
print editFileName, key
found = True
break
if not found:
editLines.append(line)

if backLines:
f = open(editFileName, "w")
f.writelines(editLines)
f.close()
print 'modify',editFileName,'append to',backFileName
f = open(backFileName, "a")
f.write(editFileName + ":" + os.linesep)
f.writelines(backLines)

if __name__ == '__main__':
fileList = ["1.txt", "2.txt", "3.txt", "4.txt"]
while True:
keys = readKeys("0.txt")
for fileName in fileList:
processKeys(fileName, "back.txt", keys)
print 'sleep 30 seconds'
time.sleep(30)

追问

不好意思呀大哥,再问一下哈,只要保存数据,不要在前面加文本名,例如保存这样就好了:
123456
234567
345678
456789
不要在前面还有个
1.txt

234567
345678
456789

追答

不是吧。

import os, time

def readKeys(fileName):
keys = []
f = open(fileName, "r")
while True:
line = f.readline()
if not line:
break
key = line.strip()
if key: keys.append(key)
f.close()
return keys

def processKeys(editFileName, backFileName, keys):
f = open(editFileName, "r")
lines = f.readlines()
f.close()

editLines = []
backLines = []

for line in lines:
found = False
for key in keys:
if line.startswith(key):
backLines.append(line)
print editFileName, key
found = True
break
if not found:
editLines.append(line)

if backLines:
f = open(editFileName, "w")
f.writelines(editLines)
f.close()
print 'modify',editFileName,'append to',backFileName
f = open(backFileName, "a")
f.writelines(backLines)

if __name__ == '__main__':
fileList = ["1.txt", "2.txt", "3.txt", "4.txt"]
while True:
keys = readKeys("0.txt")
for fileName in fileList:
processKeys(fileName, "back.txt", keys)
print 'sleep 30 seconds'
time.sleep(30

本回答被提问者采纳
第2个回答  2013-03-05
import re

patt = re.compile(r'^(\d+).*')

def _ln2num(ln):
try:
return patt.match(ln).group(1)
except:
return ln.strip()

def prepair_exists(filename):
with open(filename, 'rt') as handle:
return set(map(_ln2num, handle))

exists = set()
for filename in ('1.txt', '2.txt', '3.txt', '4.txt'):
exists |= prepair_exists(filename)

with open('0.txt.' ,'rt') as handle:
rs = [filter(lambda ln: ln.strip() not in exists, handle]
with open('0.txt', 'wt') as handle:
handle.writelines(rs)追问

大哥,能加点注释不,俺菜鸟看不懂呢。。还有,俺的PY是2.6版本的,你上面的代码运行出错

追答

import re

patt = re.compile(r'^(\d+).*')

def _ln2num(ln):
""" 用正则表达式从行中提取前导数字串 """
try:
return patt.match(ln).group(1)
except:
return ln.strip()

def prepair_exists(filename):
""" 从文件中读取各行的前导数字串, 组成set """
handle = open(filename, 'rt')
return set(map(_ln2num, handle))

# 从给定的文件列表读取各个文件的前导数字串
# 组成供后期判断用的 "exists" 集合
exists = set()
for filename in ('1.txt', '2.txt', '3.txt', '4.txt'):
exists |= prepair_exists(filename)

# 读取"0.txt"
handle = open('0.txt' ,'rt')
# 不在"exists"集合中的各行组成列表 rs
rs = filter(lambda ln: ln.strip() not in exists, handle)
handle.close()

# 将"rs"回写"0.txt"
handle = open('0.txt', 'wt')
handle.writelines(rs)
handle.close()

追问

运行不了呢。。

追答

应该是这两行的前导空格丢了一个

def prepair_exists(filename):
""" 从文件中读取各行的前导数字串, 组成set """
handle = open(filename, 'rt')
return set(map(_ln2num, handle))

追问

把删除的结果另外新建个文档储存,30秒检测一次。这两个里面没加的吧?还有就是把删除的结果在屏幕上打印出来。。
而且我运行了也不对呢。0.txt里面的数据清除了。但是1234那文本里的对应行都没有被删除

追答

dhandle = open("deleted.txt", 'wt')

# 读取"0.txt"
handle = open('0.txt' ,'rt')
rs = []

for ln in handle.readlines():

# 不在"exists"集合中的各行组成列表 rs
if ln.strip() in exists:
# 把删除的结果在屏幕上打印出来

print ln
# 把删除的结果另外新建个文档储存

dhandle.write(ln)

else:
rs.append(ln)

rs = filter(lambda ln: ln.strip() not in exists, handle)
handle.close()

# 将"rs"回写"0.txt"
handle = open('0.txt', 'wt')
handle.writelines(rs)
handle.close()

dhandle.close()

追问

这个是完整的吗?怎么里面没有1234.txt呢?大哥,麻烦您发个完整的,拜托啦。。

第3个回答  2013-03-05
我也是初学,等会给你发个试着用用追问

谢大哥。要完整点哈,还得注释一下。俺是啥都不懂的。。

追答

import os
import time
DataPath = r'..\data' #你说的'1.txt,2.txt...'放置的目录,代码自动扫描该目录所有文件
OutDataPath=r'..\out' #去除指定字符串后,保存的文件目录
SourceFile = r'..\source\0.txt'
source_object = open(SourceFile)
source_of_lines = source_object.readlines()
def saveData(file,list):#写入内容到文件
tempfile = open(os.path.join(OutDataPath,file), 'wt')
tempfile.writelines(list)
tempfile.close()

def main():
try:
files = os.listdir(DataPath)#得到文件列表(1.txt 2.txt 3.txt...)
for file in files:
file_object = open(os.path.join(DataPath, file)).readlines()
temp_list = open(os.path.join(DataPath, file)).readlines()
for fileline in file_object:
for sourcestr in source_of_lines:
if sourcestr.strip() in fileline:
print file,'====remove===', fileline
temp_list.remove(fileline)#这里是删除这一行
saveData(file,temp_list)
except Exception,e:
print 'error',e

if __name__=='__main__':
while(1):
main()
time.sleep(30)

追问

这个大哥,我目录里面也许文档有几百个,但是真正要操作的只是1.txt 2.txt 3.txt这种我指定名字的文档而已呢。

追答

files = os.listdir(DataPath)#得到文件列表(1.txt 2.txt 3.txt...)

这句话改成
files = ['1.txt','2.txt','3.txt']

追问

DataPath = r'..\data' #你说的'1.txt,2.txt...'放置的目录,代码自动扫描该目录所有文件
OutDataPath=r'..\out' #去除指定字符串后,保存的文件目录
SourceFile = r'..\source\0.txt'
我全部东西都是放在同一个目录下的呢,上面那3句要怎么修改呢?

追答

import time
files = ['1.txt', '2.txt', '3.txt'] #要扫描的文件列表
SourceFile = '0.txt'
source_object = open(SourceFile)
source_of_lines = source_object.readlines()
def saveData(file, list):
tempfile = open(('result_' + file), 'wt')
tempfile.writelines(list)
tempfile.close()

def main():
try:
for file in files:
file_object = open(file).readlines()
temp_list = open(file).readlines()
for fileline in file_object:
for sourcestr in source_of_lines:
if sourcestr.strip() in fileline:
print file, '====remove===', fileline
temp_list.remove(fileline)
saveData(file, temp_list)
except Exception, e:
print 'error', e

if __name__ == '__main__':
while(1):
main()
time.sleep(30)
print 'sleep 30s~~~~~~~~~~~~~~~~~'

还不行就加QQ121200406

相似回答